Saltar al contenido principal
Servicio de soporte de OCLC

2023 AskQC office hour member Q&A

Review all AskQC office hours  member questions from 2023.

November 2023: MARC Fields for Manuscripts and Archival Collections

November 7, 2023

What about the carrier type for a scroll?
So, for a scroll, the appropriate carrier term is going to be roll. Basically, a roll can be a sheet of paper or something else that is wound up, so that’s appropriate for a scroll.
Is an issue of a newspaper (many sheets folded together, but not bound together or attached to each other) a volume or is it sheets?
This is really not my area at all, but the CONSER Cataloging Manual, Module 33, is about Newspapers. Section 33.10 says that the carrier type for newspapers will most often be “volume.” As the CONSER people are serials experts, I would follow their instructions about the 33X fields.
Field 524 -- I’m focusing on archival collections that are digitized. We include a unique citation in each item record. Could I put a general note in this field that informs the patron to find the unique citation for each item?
It sounds like what you’re describing is the citation is actually in an item record and not in the bibliographic record, so yes, that makes sense to have a 500 note that would say “see item information for citation.” That would be fine.
Regarding the 545 example: shouldn’t there be $b after the first sentence?
That is cataloger’s judgment, about whether to use the $b or not. It’s fine to just use the $a, it’s also fine to include the $b.
What are some reasons why a library would choose “a” or “blank” in the Ctrl field?
I think setting it “a” can be useful for an institution for giving some guidelines about providing access to the materials in a reading room. For example, if it’s set for “a” for archival control, you might have a guideline in your reading room that only X amount of material can be viewed at one time or there might be some guidelines about people have to wear gloves. Those just some educated guesses.
Any thoughts on preference for (or else paired use of) MARC 340 or 655 for terms like “Coptic bindings” or “parchment,” etc.?
Field 655, that’s for form and genre terms, so some of the information that you record in a 340 could be appropriate for form and genre, but some of it is really beyond that—like production method. So I would say if it doesn’t clearly say to use it as a form or genre, you would put it in a 340. It’s not wrong to record it in both places if it’s appropriate in both. But keep in mind, Getty Art and Architecture Thesaurus, we use it a lot for form and genre terms, but there are terms in there that really aren’t form or genre. They don’t claim to be, but they could be useful for physical description of material.
For the carrier type of “card” the definition is “a small sheet of opaque material,” but I have also read that the term should not be used for printed materials. Can the term be used for printed materials such as postcards or business cards, or am I misunderstanding something?
You are absolutely correct about the definition of “card” in RDA, this is an RDA carrier type term. I have never heard or read anything about it not being appropriate to use for printed materials, though. That makes no sense to me, as somebody who has used RDA for a long time and been pretty involved with its development. I think it’s perfectly appropriate to use for printed materials such as postcards or business cards, or for other types of printed materials such as playing cards, flashcards, all of these are often printed materials. There’s nothing in RDA that says that, so I would say use it as RDA says.
This was not part of the fields you discussed, but we have just started working with born-digital items and there seems to be little documentation on how to add digital extent when there is also analog material. Any thoughts?
This probably very much depends on the kind of born-digital media, such as streaming or websites. I know OLAC has some good instructions for streaming media and I know there are some instructions in BFAS about cataloging online resources, but that’s really all I can say off the top of my head, since this is so far outside of my scope of expertise.
My institution has never cataloged archival collections. I am in the process of putting together an institutional template (and manual). Is there somewhere/someone I can ask to review for errors and fine-tuning before we launch?
How exciting! Congratulations, and I think you will have a great time. It will challenge you and you will learn things you didn’t know you could know. You can always email us at askqc@oclc.org to ask. There are a lot of things you will have to make the call for what’s the best practice for your institution, but if there are questions about MARC fields or OCLC templates, you can email us and we can provide some information for you.

I also want to call out this comment from the chat, that another resource to find people to review templates or records is the Bibliographic Standards Committee Catalogers Directory. This is part of RBMS, which is Rare Books and Manuscripts section, so the people on that committee are very familiar with cataloging manuscripts and will hopefully be able to help you there as well.
When all components or selected components of a manuscript or collection of manuscript material have been published, I often see a 581 field containing the citation of the publication and have been using the 581 field this way. (Example: a repository has a small collection of a few George Washington letters. The letters have been published in the University of Virginia Press edition of the Washington papers, as well as its predecessor Writings of George Washington, published by LC in the 1930s-1940s. The repository cites the volume, page, etc. of the published copies of their letters in the 581 field.) The BFAS definition for the 581 field seems to justify this, although if one really got precise about the wording of the field definition, it might be murky. It seems very useful, though, to tell catalog users about a published version of a letter or other manuscript that you hold. So, what are your thoughts about using the 581 field?
I think what you’re describing is a perfectly appropriate use of the 581 field. So, 581 is publications about described materials notes. If, for example, you have a manuscript that somebody has written a book about and in the bibliographic record for the manuscript, you want to provide a citation to that book so users will know “oh, I can read this book about this fantastic manuscript,” yes, you can use a field 581. The definition that we have in BFAS is “a note for the citation of, or information about, a publication based on the analysis, study, or use of the materials. Use field 581 also for citations to published sources, such as collection or exhibition catalogs, that contain photocopies or reproductions of items,” so that that seems pretty clear to me. And I think that it is rather broad, because there are a variety of possible reasons to record a 581, you know, as BFAS said it could be that you’ve done an exhibition about an item and there’s a published book, or it could be somebody else has written a book, or there’s some other kind of publication that is about the material that’s described in the bib record.
What topics will you be covering in 2024?
That’s a really good question! We don’t know yet. We are in the process of planning this month and hope to release information in December about topics for at least the first half of 2024, but we don’t have anything pinned down yet. This is a great time to plug that survey that you fill out at the end of these presentations, because we do get a lot of the topics from the suggestions there.
How mandatory is the 340?
Not at all! This is not a mandatory field, but I wanted to provide information about it because it can be really useful for some items when you want to provide controlled vocabulary. Also, someone pointed out, the 300 field is repeatable, so we who mostly catalog monographs, we don’t think about this, but when you’ve got collections of materials, a repeated 300 can be quite useful in describing information.
Would it be possible to communicate with the people determining how fields are labeled and show up in the worldcat.org display to get some of these 5XX notes displaying perhaps more clearly?
If you are using the Community Center, you can put suggestions there as to what changes you would like to see, or you could send suggestions to OCLC Support and those will get communicated to our Discovery colleagues.
If a local catalog’s Discovery layer is not set to display all 3XX and 5XX fields, do they still display in WorldCat?
If the local Discovery layer is supplied by a provider other than OCLC, the display configuration for 3XX and 5XX fields will be independent of those in any OCLC product or service. If it is provided by OCLC, for example WorldCat Discovery, both the library’s display configurations as well as the maximum set of fields that it is possible to display in a given OCLC product are independent of each other. A library’s WorldCat Discovery field display configuration settings do not influence fields displayed in WorldCat.org. All fields, including 3XX and 5XX, that the WorldCat.org user interface is coded to display will display on all records in which they occur. Some 3XX and 5XX fields are not yet displayed in WorldCat Discovery, and some are not yet displayed in WorldCat.org. There is ongoing work being done in this area. The total set of fields displayed in WorldCat Discovery and those displayed in WorldCat.org may differ.

For more WorldCat Discovery information, you can check the WorldCat Discovery documentation.

November 16, 2023

Any recommendations of documentation for use for Chinese manuscripts, Sanskrit manuscripts?
That’s not something that we’re able to help with on this panel, but a user did provide this link in the chat: https://www.eastasianlib.org/ctp/web...guidelines.pdf. There may be community members with CJK, Sanskrit experience that can provide more recommendations.
Can a manuscript be a computer file?
No, not really. You can have a computer file that would be a version of a manuscript, but I don’t think the computer file itself could be considered to be a manuscript. An archival collection that is a mix of things could certainly have computer files as part of it, though.
Is there a way to search collection BLvl records “c” in Connexion for examples of collection records?
Not directly. It is possible to enter “dt=mix” in Connexion to retrieve WorldCat records with a Mixed materials record type since archival collections are often cataloged using that format. (Be forewarned there are over 8,000,000 WorldCat records with a Mixed materials record type so additional limits will be needed to obtain a useful sample.)

There is a document on the OCLC website called Searching WorldCat Indexes where you can see how to search specific field values.
Am I remembering right that an archival control code keeps records out of the DDR algorithm?
No, there are a lot of things that keep records out of DDR, but that isn’t one of them. You may be thinking of archival resources, which might be kept out because of the dates in the 008 or the cataloging standards used. So for example, DDR is programmed to ignore anything programmed with DACS.

So just in June 2023, we did a presentation on Cataloging Rare Materials Defensively, and we included a slide that provides a list of the cataloging standard codes that are excluded from DDR.
Is there an advantage to using the 340 vs. the 655?
I wouldn’t say there’s really an advantage. Field 655 is genre/form terms, so it could be appropriate to use either field, or it could be appropriate to only use, say, the 340 field. For example, when I was talking about production method, that’s not really a genre form, so that’s appropriate for 340. There are so many subfields in the 340, there are probably things that would be appropriate for 655 that wouldn’t be for 340, say like “paranormal television shows,” a genre that isn’t a physical description. There is some overlap, I acknowledge, and it’s going to depend on what you’re cataloging.
If you are only using RDA for the access points, and another standard for the basic description, do you put both codes in 040 $e?
No, typically people don’t. If people were doing that, if they were using, say, DACS, for their main description, they would just record $e dacs. But you can have more than one code in subfield $e, so you might be using some RDA elements and DACS in your bibliographic description area, then it would be appropriate to use two subfield $e codes. The BFAS 040 page does include some information about multiple codes.
Could 2 books that were published separately but bound together by the library be cataloged together as an archival collection?
Technically, yes, but I don’t think that’s a good idea. I think that’s not really how people are going to think of an archival collection. A standard like DACS is not going to be good for describing that situation, which is really a bound-with. That’s really not what’s intended when we talk about an archival collection. DCRM, RDA have some cataloging guidelines for bound-withs. The sorts of things libraries bind together, such as a year’s worth of serial publications, are not likely to be thought of as “archival collections” by users.
I have used field 530 for links to a digitized version of materials from archival collections, rather than the 856 field. Is this an appropriate use of 530?
So, yes, the 530 field is “additional physical form available” note. This could be your catalgoing the original manuscript and you want to say it’s also available on the internet in a digitized format. The field does have a subfield $u to record a URI, so yes, you can do that.

October 2023: Meeting librarians where they are today: Incorporating linked data into existing library workflows

October 10, 2023

I miss WorldCat Identities – it was useful to me in my work (WorldCat Identities project site)
WorldCat Identities was a research project that was concluded earlier in 2023. The data gathered on the project supports our efforts to build an entity ecosystem, and our focus is now on the WorldCat Entities ecosystem, which is now the source for persistent Person identifiers: https://id.oclc.org/WorldCat/entity
Can you explain "productionizing linked data at scale" please; that is a lot of jargon!
So first, what has been done on OCLC’s side is experimental. So, we had services in place that you could view linked data, and, again, it was sort of experimental, so it may not have had the capacity for a huge number of queries that is required of doing this at a massive scale. I’d say productionizing it means putting true infrastructure and services in place to make it possible that this becomes part of a workflow. So in some places, folks were adding identifiers to their MARC records, and that might have been one library’s workflow. In this case, we’re trying to bring all libraries along for the linked data ride transformation, and so we have basically scaled efforts, large efforts, to add these URIs, to incorporate linked data into our existing OCLC enrichment, deduplication processes, things like that that they’re really sort of transitioning to mainstream, active workflows.

So it’s true, we’re sort of transitioning from a period in time, from the 2010 to 2016 timeframe, when we were really just experimenting with how could we use link data to improve end user discovery or what's sort of the value add for feature-facing authority work or entity management, sort of taking those experiments in research and development exercises, and actually turning them into production-supported data, production-supported services, things like that. And then what that manifests itself in is doing it at scale and also ensuring that these identifiers, specifically these authority identifiers, are embedded in library workflows to help pull together disparate workflows that you see across libraries, where there's the cataloging department, there's a special collections or digital materials, there's e-resources, there's researcher information management, and all of them sort of independently had been sort of playing around with linked data over the past decade or so, but within their own workflows. So we want to try to provide a set of linked data to help us create a uniform set of identifiers that can be used across all library workflows and help pull together data and minimize the duplication of both the data and the workflows needed to work and interact with that data.
I also find the absence of an ecosystem that enables an end user to find and use these links is a major barrier to adoption, especially at smaller shops. Any comment on that?
I think that that's one of the things that we really want to work with, is providing those tools and services for all libraries to be able to, not only find these identifiers, but also create identifiers when they need them. We talked about authority entities and one thing we sort of migrate to is authorities. Authorities exist in an authority service, like Library of Congress Name Authority File or whatever it happens to be. But where we've actually heard from folks is the biggest need for identifiers is actually from smaller institutions, or special collections, where the people that they have materials about or by aren't published authors. They're not necessarily even eligible for entry in a name authority file, or the library's not a NATO member, so we want to make sure we can provide tools and data for all institutions to be able to accurately identify all of these sort of entities that build the description of their bibliographic resources or their data in general, in an equitable way, in a shared ecosystem.

The absence of that shared ecosystem is really challenging, and in this way, working with our tools, like Record Manager, WorldCat, WorldCat entities, we're building the integration into things. So with your contributions, any size institution, if you're still working in MARC, but want to enhance your records with WorldCat Entities, you're part of the linked data graph. That will benefit all the institutions working with this WorldCat data. We really want to make it easy and provide those tools and services to support workflows, in a way that, as a smaller institution, you don't have to build your own graph or your own transformation. Our goal is to help be that hub and provider to bring this ecosystem together.
You said that OCLC plans to support Bibframe. Do you plan to also support the RDA ontology (what some call RDA/RDF)?
We’re working with the community to evaluate all of the non-MARC standards that we need to support. Just because of our very close partnership with Library of Congress and our work in supporting the LD4 grants, we are just starting with BIBFRAME. But that’s not to say that we’re going to be only supporting BIBFRAME as our non-MARC bibliographic standard.
What does publishing Dewey as linked data mean? Will this be publicly available since Dewey is kept being a paywall?
We’re working with our Dewey editorial team, specifically Alex Kyrios, on that right now. As we get closer to publishing all this, there will be much more information available. But I can say that with the Dewey linked data, as with any linked data, we will be following a web-based best standards and practices for publishing data, which means having identifiers that are resolvable for linked data supported under them.
Will there be any support for Indigenous languages in OCLC Meridian?
So as we’ve been working on this, we know and have heard from librarians the limitations of different language codes and lists. What I’m looking forward to is working with the community to learn how we can best support these different languages. Right now, the languages available for entering the labels are part of what’s called the BCP47 standard, which is a web standard; and to characterize the language of a work, we’re using the MARC language list, but recognize that those have gaps to the language representation. So I think next steps will be learning how we can support in a linked data way these languages that may not be represented in either of those, as well as the ISO standard.
Will OCLC Meridian be offered as a part of our current subscription with OCLC Connexion and other services?
As we’re working on plans to roll out Meridian, we’re evaluating how it fits in with the other cataloging tools, so this is sort of an authority entity management service, compared to a bibliographic service. So we will have more information as we get ready to launch Meridian, early in 2024.
Can you speak about what drove your choices of terminology, such as "organization" rather than "corporate body"?
One of the guiding principles behind the development of the WorldCat ontology has been to craft terms that are readily accessible to the broadest set of users, including those outside of the librarian community. So while a term like “corporate body” is well understood within the library community, “organization” is a more broadly understood term to the general public. So the choices for these terms is informed by some of the work that was done with the earlier linked data pilot projects, such as the ContentDM linked data project and project passage.
What is the advantage of creating yet another URI for The Ohio State University, when it is already in the LCNAF, and in VIAF, and Wikidata?
There’s a couple of distinctions. For example, VIAF is just an aggregation, it’s a static data set that you can’t edit. You can ask for changes to be made based on PII concerns and things like that, but it is not a living, breathing data set in the way that WorldCat Entities will be. And for something like Library of Congress Name Authority File, you do need to be a NACO member to get in there, and there’s also requirements needed to submit a new authority heading. So we’re certainly not trying to duplicate anything, but there are use cases outside of traditional authority management that we also want to account for, based on that very robust model that Anne spoke about and that Michael is managing. Ultimately what we want to provide for WorldCat Entities is a set of linked data for authorities that is suited for library workflows, that libraries can create and manage and make connections to and from based on the use cases they have, not just in cataloging but in all the various departments that reside within the library.

In the WorldCat Entities data, we do include the identifiers from different systems. It will include both linked data identifiers, such as Wikidata and Library of Congress, as well as those that might not be published in linked data. So it can serve as that hub, and if you ask WorldCat Entities to return you the entity at the end of an LC ID, we’ll be able to access that.
Can you explain why we need an OCLC-specific tool (Meridian) to do what looks to be the same thing as Wikidata? Why not just create these authoritative entities in Wikidata, which would support the idea of shared cataloging?
The way I like to break it down is with a somewhat simplistic metaphor: there’s sort of two ends of the spectrum, so Wikidata is like a wide open field, where you can say whatever you want about anything but conversely anyone else can say anything as well. On the opposite end of the spectrum, you have sort of the traditional library authority system, the LC authority file or whatever you prefer to use, and that is like a French garden: it’s very rigid, there’ s clearly defined paths, very distinct walls, so you’re very instructed as to how to interact and create a work within this French garden. What we’re striving for with Meridian and WorldCat Entities is somewhere in the middle, somewhat like an English garden, so there are paths that you’re meant to walk down, that sort of metaphor. It is a model, you can’t just say anything or create something to say anything you want, but it’s a little bit more open, a little bit more free-flowing, a little more akin to the wide open field that is Wikidata. What we want to ensure is that the data is being curated for library use cases, so being able to say specific things that are critical to library workflows, whether it comes to security and digital metadata or researcher information management data or electronic resource metadata, there are connections that need to be made even between those authority entities that are critical for workflows. And if those data can be changed or edited, for better or worse, by anyone, that does pose a risk to workflow reuse of the data. That being said, if the use case is just “I want to create a knowledge card and throw it in my discovery system,” and throw up an image of the author, then I think we’ve seen that Wikidata does work well for that. But when it comes to productionizing linked data and getting it really embedded into library workflows, what we’ ve heard from development partners for Meridian is that a somewhat more closed-down environment is what would be a more reliable set of data to work with.
Parallel to my question about Meridian—LC has already created a BIBFRAME editor—why don't we all use the same one?
There’s a few BIBFRAME editors, there’s the Library of Congress Marva editor, there’s also one developed as part of the LD4 brands called Sinopia. Ultimately what we’re trying to do is build an editor that can leverage and use the underlying WorldCat data to help provide efficiencies to catalogers using it. So one example would be, when someone’s cataloging something, creating new BIBFRAME, a work and an instance and an item description, in whatever order you prefer, one can just start cataloging an instance, and then based on the corpus of WorldCat data and the works that we know exist within it, recommend to a cataloger “is this the work that you’re looking for? Because if so, you don’t need to create the whole description, you can just use the one that’s here and link to its identifier.” We’re also working closely with development partners to just understand the preferred workflows to have in the process as it applies to BIBFRAME, the same way we’ve been doing for cataloging partners in Connexion and Record Manager.
Will there be data similar to MARC 670 to use to ensure where data came from to back up the choice of name and to verify for the cataloger that they have chosen the right entity to link to?
Within WorldCat Entities there will be the notion of reference for where a claim comes from. There will also be a change history, so what was added, by what institution. So there certainly will be the ability to track down within the data where claims are coming from. Also, because WorldCat Entities does serve as a hub and links out to these other authority files around the world, there will be the ability to, if you know you want a certain Library of Congress Name Authority heading, and what you have is a WorldCat Entity URI, or you’re going to go off and get a WorldCat Entity URI, you or the system can go off into that name authority file, pull back the appropriate heading you want, and provide that as the controlled string.
How will OCLC handle the fact that BIBFRAME is much less granular than MARC?
We’ve been working with BIBFRAME for quite awhile, and I know that the Library of Congress in their transition to BIBFRAME have been struggling with the same issue, and there’s a variety of workflows that they’ve come up with, they’ve created a test bed ontology that they call Library of Congress BIBFRAME that accounts for more than MARC-ish features within BIBFRAME. What we are working to strive for is to take advantage of what is good about BIBFRAME, which is building connections between things, having identifiers for things, and then integrating that with what MARC is really good at, which is inventory control, procurement, things like that.
Will catalogers working in BIBFRAME be able to enhance a record that already exists in MARC? And vice-versa?
Yes, that is what Anne was talking about in the last slide. The use case here is, let’s just say you’re in a consortia where one of your libraries has jumped over to BIBFRAME but the rest are still working wholly in MARC, a consortial member in a MARC cataloging library could create a record in MARC, send their friend a message at the library that is working just in BIBFRAME; that librarian could log into the BIBFRAME editor, pull over the record that was just created in MARC, edit it in BIBFRAME, submit it, send a message back to their colleague back at the MARC cataloging institution, and they could log into say, Record Manager, and see the change added. So the idea is that you can seamlessly work back and forth in MARC and BIBFRAME based on the data system you’re working with, the workflow needs that you have, the data expectation, or the training and expertise that your cataloging staff has.
A previous question brought up the lossiness of BIBFRAME-MARC conversion. If OCLC's editor is supposed to be seamless between the two, does that mean working with a simplified MARC format? Would that result in further duplication of MARC records in WorldCat?
No, again, we’re going to be ensuring that deduplication is working the same way with BIBFRAME data, the import/ingest, as it does with our MARC data. One of the biggest advantages of being able to curate WorldCat data in BIBFRAME will be those enhanced relationships that one will be able to create, such as adding identifiers across WorldCat data. Because identifiers are so inherently important in linked data, we’re hoping and expecting that as librarians start creating and curating BIBFRAME natively, all of those linkages will flow into WorldCat MARC view of the data to help improve the quality of that.
Can you speak about how adding BIBFRAME data could impact/create duplicates? (e.g. If a MARC record for a resource exists in WorldCat, and a BIBFRAME record for the same resource is added)
With our BIBFRAME ingest processes, they’ll be going through the same pipelines that MARC data does, ultimately. So the same deduplication efforts that are in WorldCat today, that we’ve talked about at IFLA and at these other presentations in the past, will be running on BIBFRAME data as well. So we are running tests to ensure that what pops through the other end, we ingest BIBFRAME, is quality enough to be a viable candidate for deduplication checking within WorldCat.
If you can't use Record Manager, but only Connexion, what updates are being added to Connexion (such as the look-up feature that you mentioned)? It seems like all the new functions and features are for Record Manager and not Connexion.
If you have a cataloging subscription, you have access to Record Manager. The information at this link will explain how to go about getting a login for Record Manager: Create a Record Manager account. But you do have access to this interface as well. Questions around the development should go to Chelsea Dalgord. Connexion does have a different code base so it does require more work, but we do understand that Connexion is an important product just like Record Manager and we will be working to integrate parity across them.
Is OCLC planning to have training sessions for BIBFRAME and Meridian?
Yes, as we get closer to launch for Meridian and closer to the launch for the BIBFRAME editor, as we do with all our current products, we’ll have training available. There’s a lot of community groups and others that do linked data training, but I think part of OCLC’s role in the community is thought leadership and part of thought leadership is training and access to information for new and emerging standards, like BIBFRAME.
Why is OCLC going to use only $1 (real world object) rather than $0 and $1 according to their respective definitions?
For the entities Anne mentioned, the conceptual ones, those are sort of subfield $0 identifiers now across WorldCat and will likely stay that way. But for the other ones, those actually are real world entities, so that the person, the event, eventually the organization, the places, the linked data is describing the real world whatever it happens to be, so those will be by definition applicable for subfield $1 identifier.
For these tools: can you make it so that we can "derive" new entities from existing Wikidata entities? Because that would save a LOT of time— creating these things is more time consuming than encoding in MARC, so that would be a big help.
Thank you for that feedback; we definitely want to support workflows that save folks time, so I’ll take that suggestion to our team to discuss!

October 19, 2023

You mentioned that OCLC is currently adding WorldCat Entity URIs to bib record. Can you say a bit more about what you are doing? Is that an automated process – and if so, will that process run continually? Given that OCLC is adding the look-up feature in RM, is it hoped that cataloguers will start adding them?
Initially, it will be a one-time load, but we’re working on, within probably the first quarter or so of ’24, of getting an automated process running. The same way that authority control runs across WorldCat on a daily basis, we will be doing the same thing with adding the WorldCat Entity identifiers. So if a new record gets added by an author who exists in Entities, we’ll be able to add their identifier to that new record in a reasonable amount of time.
How will Meridian interact with the Name Authority File (in Connexion)? Will it populate the NAF or cross reference it, or is it totally separate?
It will complement it in that WorldCat Entities, the data, connects out to existing authority sources around the world, to things like Library of Congress or Canadiana or VIAF, and also non-library authority sources like Wikidata. So that will be the connection between the two, and the distinction between an authority file, like, Library of Congress NAF, and WorldCat Entities is that they are very complimentary, but the more robust data model that we have underlying WorldCat Entities will just allow for people to describe important characteristics or relationships of a person, place, event that falls outside the bounds, or confines, if you will, of the MARC authority format.
Will Meridian be populated with existing data sources like the LCNAF, VIAF, etc.?
Yes, it will be. So that's how we have begun to build the existing entity data that, so we will be building processes to continually update the WorldCat Entity data based on changes in authority files, like, a death date’s been added to a person or an additional external source has been added to a person, for example. So there is a very tight relationship between existing authority sources and WorldCat Entities. Again, it's just the use cases for them can be slightly different. Another important distinction is, with Meridian, we're really trying to provide a place where people can create entities or authorities, if you want to call them that, for people that aren't necessarily eligible for inclusion in a traditional authority file. So, think about all of the unpublished people who have contributed to special collections of an archive or diaries at a local historical society, those sorts of unpublished people. Or people, photographers, say, who have collections in digital repositories. Those individuals are typically not represented in authority files and in those situations, they don't have identity links or authority links you can add to them to make the data, sort of stickier, if you will. So, in terms of comparing how does something like LCNAF compare to Meridian, that is one of the distinctions between the two data sets.
How about those that only upload records? And no access to direct input into WorldCat? It will be accepting those $1 in bib records? My company provides/upload records to WorldCat. Do we need to add those $1 when bib records are uploaded for ingest?
So that would be ideal, and you'll certainly be able to do that, but WorldCat also has an authority control process where we try to make authority control links wherever possible, based on the encoding of the subfield $a, within, let's say, a 100 field, so we will try our best to try to match them, but this does kind of go back to that question earlier about quality. We want to make sure, to the highest degree possible, we're not making incorrect links. For example, if you have a 100 field in the MARC, so a primary contributor, and it's just John Smith. There's a lot of John Smiths in the world, and without having any context, we don't want to just be, you know, algorithmically applying a WorldCat Entity identifier to that heading without knowing to a pretty high degree of certainty that it is the right ID we’re assigning to that person in the MARC record.
Is Meridian exclusively used by OCLC/libraries, or is it a platform that is being used by other disciplines/industries?
Upon initial release, it will be for libraries, but that's not to say that we don't want to explore partnership with other disciplines or industries. And the question we have been asked is, “where does Meridian fit within the broader ecosystem, where you have something like Wikidata?” And, what I want to clarify is that, while both kind of complement each other, we really want Meridian (i.e., WorldCat Entities), the data within the WorldCat Entities to be tuned to the workflows of libraries in general, but not explicitly just cataloging. So, there's a lot of disciplines and departments at a library, and we want to make sure that the data and WorldCat Entities can work across all of those workflows, but not necessarily initially include all of the additional information that isn't relevant, or isn't useful for librarians. That's not to say that things in Wikidata aren't useful, but I think when we start looking at what the strengths of Wikidata are compared to an authority file, I think it's pretty clear, you know, an authority file has sort of authoritative access, entry points for bibliographic data. And the biggest pros for those is those authority headings, or links, if you will, are included in a lot of bibliographic data, whereas Wikidata, not so much. But what Wikidata can do is it’s really good at is building, like, a knowledge card to give an end user more contextual background to the author they're searching for, or the subject heading that they're interested in, but just the very nature of Wikidata being sort of outside or agnostic to any specific domain, it makes it just a little bit more challenging to use. So, yes, initially Meridian will be focused on librarians, focused in the library industry, but it needs to, at the very least, make connections to outside disciplines and industry, and I think there are use cases where you'd actually want to include those individuals or those groups in the creation and curation of WorldCat Entity data.
What kind of quality control will there be for these new services?
So one aspect of quality would be deduplication. So to talk about that in a little bit more detail, we're planning on having sort of two points of, quality checking there. There will be one check at the point of creation. So, if someone is trying to create a new person, and that person's name is Jeff Mixter, the system will prompt the user, saying “there's already a Jeff Mixter, or a Jeffery Mixter, or whatever in the system. Are you sure you want to create another one?” And then there will be other cues like, when, let's say, creating a date or a place of birth, you know, the system can start prompting the user that, hey, there's a lot if there's another person in the system that has a lot of the similar characteristics as you're about to try to create for this person. There will also be, as there is with WorldCat bibliographic data, sort of an offline, continually-running process to look for duplicates. And that process will involve not just direct characteristics of the entity—birth date, death date, birthplace, whatever—but also leverage relationships that that person, let’s say, has to other things. If you have multiple, let's say, works, all about the same topics, all published around the same time, and four of the five are by Jeff Mixter, who happens to be Entity 1, and the fifth is by Jeff Mixter, who happens to be Entity 1000, there's a very good chance that those Jeff Mixters are probably the same person. So there will be this sort of offline process that will be running that will then have the same quality control mechanics built into it that the existing quality control have across WorldCat today.
Can the URIs in WorldCat bibs be exported along with the records?
Yes, they will be. So, if you are using a Collection Manager, the query collections within Collection Manager, when you export those records, all those identifiers will be included is subfield $1s. Also, if you're using the metadata API or even a Z39.50, those identifiers are actually embedded in the MARC record in a subfield $1. Basically, any way you get records out of WorldCat, those identifiers will be included in them.
Can you speak a little bit more about Historical Societies and Special Collections that rely a lot on local subject headings non-authority names? How will that function in this environment?
What we kind of envision is that for organizations, for example, historical societies and special collections, that Meridian and WorldCat Entities will provide a system and a data set where those librarians can create new Entities for these people, and then we obviously understand that that could require, you know, some background work. But, actually in a previous research project I was working on, we were working with a historical society in Minnesota, and they actually had a very nicely curated, albeit in a PDF, set of all of the local personal names that they used across their various collections, and in working with them, it was very clear that the intellectual work to compile really very nice descriptions of these local heading had already been done. They just literally didn't have a place where they could put that and then get an identifier assigned for all those people. So that's really one of the primary use case for Meridian is this is a place where any librarian can go and create a permanent identifier for someone who otherwise wouldn't be eligible, if you will, for inclusion in a larger authority file environment.
How are vendors who sell and provide services to libraries going to be able to use Meridian and contribute to curating authority data?
I think that would be a question, probably, for our business development colleagues. As I mentioned earlier, initially, we want to focus on sort of libraries and librarians is the primary users of Meridian. But obviously, there are super important data and records that are contributed by vendors and those vendors and publishers obviously know a lot about the folks who created the content that they have. So, I will say that it is important, we understand that there's a lot of parties who can contribute to the curation of Entities data, but at this point in time, I would just defer to our business development colleagues for more information around that.
Finding the actual Work ID in WorldCat Entities is currently extremely difficult. Expressions and Works and probably Manifestations are all mixed together in WC Entities. Will it become easier to find the actual right ID for an RDA/LRM work?
Actually, over the past four or five months or so, we've done a lot of data quality improvement for WorldCat Entities that will be launched here in a month or so. A lot of that work has included the duplication of Works within WorldCat Entities, so I'm hoping that that will help. We also added some additional faceting to the search experience, so not only can you look just for Works, but you can look for text versus moving image versus still image, et cetera. But I do want to point out that Works in WorldCat Entities are not at the LRM “Work” level, they're actually at the WEMI “Expression” level, so they follow the BIBFRAME “Work” paradigm. So Works in WorldCat Entities are going to split on language, and, broadly speaking, on carrier type. So the English text of Harry Potter will be different than the Spanish translation of Harry Potter and text, and, you know, more specifically the English audiobook of Harry Potter will be different than the English text of Harry Potter. And we're also now working on sort of building connections between those works.

So, ideally, what you'll be able to do is find a representative Work. And the idea here is, the WorldCat Ontology notion of a Work is more aligned with the Expression, so what we lose there is that aggregating mechanism that the BIBFRAME hub or LRM Work could provide, so what we are proposing instead is the notion instead of a “representative work,” which is not a class in and of itself, it’s just a WorldCat Work like any other except that it represents the first instantiation of a Work, the first publication from which all these derivatives such as translations or format then derive. So the very first publication of Mark Twain’s Huck Finn could be a representative work from which numerous translations derive and film adaptations and audiobook readings, and all of these would be Works unto themselves that have transformative relations among each other, but all point back either directly or indirectly to that representative Work that is the very first instantiation.
Will different Spanish translations by different translators (different expressions) get their own IDs?
Yes, that that is correct.
I am thinking about our current workflows. Will there be automated functionality that will add new WorldCat Entities based on MARC authority records newly created in WorldCat? Or is it hoped that cataloguers will create a new WorldCat Entity when creating a new MARC authority record?
So we want to actually enable sort of both functionalities. Basically, if you were in, let's say, Record Manager and you’re cataloging a brand new book and that author doesn't exist WorldCat Entities, within Record Manager we will have a point of need new Entity creation, so you could create a Person right then in there so you can add their Identifier to the record you're trying to catalog and continue on with your workflow. But also there will obviously be the ability to just go to Meridian and create a Person, like, in a separate window tab or whatever. As for an automated workflow, yes, we do want to be able to so basically pre-populate or continually add new Entities to WorldCat Entities over time. So it's not just a totally manual process and we're evaluating sort of different approaches there. One approach I think would be very interesting to pursue is sort of evidence-based, basically looking across WorldCat for, let's say, personal name headings that are controlled to some sort of existing authority file but do not have corresponding WorldCat Entities, then evaluating those for just automatic creation of an Entity for them. There's a variety of ways in which we could approach this. I like the sort of evidence- based creation approach, but there's certainly others. In short, yes, we want to make sure that a cataloger can create a new Entity at point of need, or just as part of their authority work process. But also that OCLC is making sure that the Entities are being kept up to date with new information as well as just new Entities on a regular basis, so it's not all the burden falling on the users of Meridian and WorldCat Entities to create and curate those things.
There is a lot of effort involved in curating high quality authority data. Does OCLC have plans to provide incentives to encourage high quality work from members?
We've certainly talked about opportunities and options there. You know, there is sort of the crowdsource notion of, you know, as a user get like a gold chevron for having done X amount of curatorial work. That's maybe a little less tangible of sort of an incentive. But we fully understand that providing high quality data curation is a full-time job. Librarians and catalogers are knowledge workers and that part of their job is creating knowledge for the world, to share with the world so it can be reused by people around the world. And yes, we need we need to evaluate that, so if you have any ideas, I'd love to chat with you more about that.
This may sound clueless, but just to be clear. We're creating a MARC record in WorldShare and we need to use Meridian to create a non- authority person. How do we link out from the 100 field to create that?
The workflow will be if you're in Record Manager, like, specifically creating a MARC record for a book, and you're about to enter a 100 field for the primary contributor and you realize that that person doesn't exist in any authority file. And in Record Manager, you have the ability to look up a number of authority files. We’ll be building functionality to look up that person, in this case, in Entities. The idea will be if you can't find that person anywhere, there will be sort of a widget that will pop up that will allow you from within Record Manager to create, in this case, that person, hit save, that will create the person in a WorldCat Entities and drop the identifier that's been created for that person into the subfield $1 of that 100 field that you were just working in.
So Meridian is incorporated into Record Manager?
It will be. So, once we launch Meridian in early 2024, Meridian will be underpinned by APIs, and what we will be building in Record Manager is an API connection. So, Meridian users within Record Manager will be able to create those Entities at point of need. Meridian itself will be a separate web application, but because it's an API-first application, we'll be able to plug those APIs into a variety of different products and services.
Are there plans to create APIs for Connexion?
Yes, and we have discussed those. Overall, questions around the development should go to Chelsea Dalgord.

September 2023: Get Informed about Genre/Form Terms

September 12, 2023

Do you base capitalization of genre/form terms in the bib record on the way terms appear in their source lists?
Yes.
With regard to AAT, I can't find guidelines from GRI about capitalization of its terms, and its own examples include initial caps. Do you have any
information on that?
I would suggest that when you are using the Art and Architecture Terms, in either Record Manager or Connexion, that you use the capitalization as they're found in the vocabulary. Don't change it. Remember I mentioned that Record Manager allows controlling of the Art and Architecture Terms, so when you control the heading, it's going to control it the way the capitalization is found. It's usually all lower case in the Art and Architecture vocabulary. I'm sure there's some proper noun exceptions, but it's mostly lower case letters.
Would you use ISBD guidelines for capitalization?
I would not, for two very important reasons. The first important reason is, what we've already briefly touched on, about how it's important to use the capitalization as it's found in the source vocabulary. The second reason is ISBD guidelines don't apply to access fields. They apply to descriptive fields, so doesn't ISBD doesn’t say anything about how to capitalize genre form terms or subject headings or even authorized access points for people. So ISBD is talking about capitalization, how titles are found in a manifestation and so on.
Will that feature come to Connexion? That is, the ability to control various vocabularies.
Not for the foreseeable future.
Can you please speak to demographic group terms and how they are used?
Demographic group terms can be recorded in either bibliographic records or authority records. They can indicate either information about the creator, such as author, if the author is from a certain demographic group; or they can indicate the audience that a resource is intended for. For example, a manual for nurses where the demographic is nurses. Often demographic group terms are important for juvenile literature, because libraries often really care about whether it’s a resource intended for children or not. The Library of Congress has a demographic terms vocabulary and a manual for that (https://www.loc. gov/aba/publications/FreeLCDGT/freelcdgt.html). There are other thesauri you can use, and if you go on the MARC pages, there’s source codes for demographic group terms. It would tell you that you could also use demographic group terms from the subject code list.
So can LCSH be applied as genre terms in 655 _0 if a genre term can't be found? Or would you still put it as 650 _0?
Yes, you can use an LCSH term in field 655, second indicator 0, but the problem with that is it’s not controllable. I think you would have a hard time finding a term that’s in LCSH that isn’t in another genre vocabulary that you could use. As far as recording it in 655 or 650, the subject headings manual has some instructions about when it’s actually a genre that you’re recording, that you’re using 650, so it really depends on the situation. I would encourage you to follow the Subject Heading Manual when you’re using Library of Congress Subject Headings, but I would not just record a term in 650 because I could control it when I knew it was actually a genre term. I would try to find it in an applicable vocabulary. LCGFT were originally derived from LCSH, so most of them are there, and we’ve got a slew of great freely available vocabularies. And just because they aren’t controllable in Record Manager or Connexion doesn’t mean they’re not great vocabularies.
Could you please update the subject enhancement project, i.e. adding equivalent subjects in WorldCat records?
We completed the initial retrospective last July 4, 2022, and we’re about ready to turn on the process again just going forward, to begin catching up on everything that’s accumulated from last year to the present. So you should be seeing more headings being added in the near future.
One thing I often find confusing is that 650 _0 Christmas $v Fiction (which could be assigned to a story about Christmas) controls to 650 _0 Christmas stories (which as an LCSH should be assigned to books about Christmas stories). What heading should you assign to a Christmas story?
The reason this happens is, when I go to Library of Congress Subject Headings, I see that Christmas, dash Fiction, is a UF (Use For) Christmas stories. That means you aren’t supposed to use “Christmas $v Fiction.” So I would consult the Subject Headings Manual. SHM H 1790 Literature: Fiction covers when a subject heading is assigned to bring out both the form and topic of a work (https://www.loc.gov/aba/publications...eSHM/H1790.pdf); Section 3 states that for collections of fictions by multiple authors, a phrase heading that includes a form and a topical aspect may be assigned. “Christmas stories” is this type of heading, so it may be assigned to collections of fiction. But as it states in Section 4, it should not be applied to an individual work of fiction.

Remember, LCSH has been around for so long, decades before there were LC genre form terms, so there’s still transition going on. In “Special Provisions for Increased Subject Access to Fiction,” a genre/form term is assigned from LCGFT to an individual work of fiction, and following this approach, you could assign “655 _7 Christmas fiction. $2 lcgft.” So Dickens’ Christmas Carol would have this 655 genre term, and not a 650 with “Christmas stories”; however, a collection of Christmas stories could get both.
Is $v Biography $v Juvenile literature acceptable in one 650 field?
This is a case where it’s important to consult the Subject Headings Manual to make sure there’s no instruction tell you not to do this. I have done this before, so I can tell you that this is okay to have those two together.
Recently, there is a FB post in 'Troublesome Catalogers and Magical Metadata Fairies'. It is about Connexion brief record: having 1xx NOT AVAILABLE and 245 untitled in bib. They may be the vendors' batch records. Will that be the part of duplication merging in the future?
This is an indexing issue and it’s related to deleted records. Those would just need to be reindexed by us. You can report them to bibchange@oclc.org and we can quickly reindex them so they disappear.
"Young adult fiction" is one that is noticeably absent in lcgft. Why?
This is one you’d have to ask Library of Congress. But the genre form terms started in 2007, so of course they’re not as old as LCSH. There are mechanisms to propose a new genre form term. LCGFT does tend to use “Teen” rather than “Young Adult,” though I did not see “Teen fiction” in there either.

September 21, 2023

Are the subfield 0 (authority record control number) and subfield 1 Real World Object URI to be applied to aat terms in addition to the subfield 2?
The $0 and $1 that you saw in the slide were automatically generated by OCLC. You could include them by hand if you wanted to, but the fields are optional. When including $0 and $1, URIs are much more to be preferred now than just identifiers. For the controllable fields or thesauri, it’s better to use the parenthetical qualifier in front of the identifier rather than the URI, so it’s easier for the system to control the heading using the ID that’s in $0, especially for fast terms, which only control by $0 and not by the text right now.
Is OCLC identifying invalid terms from LCGFT and removing them or changing the coding?
So there’s not a huge, comprehensive project to do that, but we have identified some terms that are not valid and corrected those. “Electronic books” was one, that’s not an LCGFT term, and there are other similar to that. We don’t have an easy way to identify everything that is not in LCGFT that somebody has added anyway. Please do report those to bibchange@oclc.org if you come across them.
I have noticed that contents and summary notes in many records do not match the resource described by the record. What quality control is in place to ensure that these notes are not added to the incorrect record? I plan on notifying OCLC about the erroneous ones I find but would really like to not have to review all the 505 and 520 fields in records I import to update records. Example: Seeds of woody plants in the United States / with 520 = A biography of the Sioux leader, including the Battle of Little Big Horn.
There was a project some time ago, a data sync project in 2018, the records were not matched well and parameters have been changed since then. But if you see examples of this, again, report them to bibchange@oclc.org so we can investigate and correct them. This was a one-time issue and the errors should not reoccur.

August 2023: Rapid Harm Reduction with Locally Preferred Subjects in WorldCat Discovery

August 8, 2023

Is the re-mapping Google Sheet available only to subscribers to WorldCat Discovery?
Anyone can download it: https://docs.google.com/spreadsheets...Mqw/edit#gid=0
After the subject re-mapping is done, will both the original headings and alternative headings be indexed and searchable?
This gets exactly at some of the changes that we're making that I was talking about. Currently, it's the original headings that are indexed and searchable. In the future, depending on what your library does users will be able to discover content using the alternative heading.
So, libraries can search under "old" or "local" headings, but only "local" headings will display in the bib record? Will this apply only for bibs held by the library, or all WorldCat records retrieved by the search?
Yes, libraries can search under the old headings with the local headings displayed in the bib record. This applies only in the library discovery interface, but to all records in that library discovery interface, it doesn't have to be an item held by the library it could be WorldCat records. The local subjects display on all those records, but again, only for that library discovery interface so it happens for all the records but only for the user searching that library.
Are there plans to add an option to limit the use of subject remapping wildcards to topical headings (650) only? For example, if I map Blind to Person who is blind, it changes Blind River (Ont.) to Person who is blind River (Ont.), which is not a change I want to make. The use of wildcards often changes geographic, personal name, and corporate headings that should not be changed.
That would be covered under some of the work that we're doing now to enable libraries to specify the authority source for the mappings that they want to do. And it's for some of those exact reasons where we might have geographic names that include language that we might otherwise want to read that for just topical headings.
I am now using Connexion 3.1 and want to know how to perform tasks I used to use macros feature for e.g., adding all 33x fields at once, creating auth records from an OCLC record and adding non-Latin alphabet information to records using a downloaded package
On the Connexion download page, you'll see instructions for how to download the macro book and how to convert those macros to work with connection 3.1. Another suggestion is to contact OCLC Customer Support, and they can step you through that.
Does OCLC Research plan to review any of the local suggestions and submit to LC for possible changes? It would be good to automate SH proposals (also for MeSH?)
My OCLC research colleagues are super interested in the work that libraries are doing, so we’ll bring that suggestion to them.
Will diacritics and alphabets other than Latin be usable in the uploaded WorldCat Discovery document from my library? Have there been any problems with this?
We haven't had any problems reported with this. Other alphabets and critics should be supported, diacritics work an example is the number of our member libraries whose interface users are using it in both English and French.

August 17, 2023

Would the wildcard subject replacements change the entire string, or just the portion within the rest of the string?
For the wildcard subject replacements it just changes the portion within the string so that portion that you've identified that you want to replace, in this case the example that I've been using, Homeless people only gets replaced and then the rest of the string remains the way it was so that you're not then getting rid of that other useful metadata.
What happens if someone searches "people experiencing homelessness"? So, if catalogue searches are on original terms does that mean users can’t use these replacement/less harmful terms to search the catalogue?
The current state is this does not behave the same way as if you were to run a search on the original subject. So, you bring up results, but they would not be the same results. That work we're doing to support local search expansions will ensure that you can, then, do that. Users cannot only discover these records with a replaced subject using their original search methods but also then with new search methods. So, if your library wants to do something like create a list of inclusive subject headings to search for, then a user could use that, and they would bring up all the relevant results.
Where can we find the Local subject re-mapping template? Can libraries who use a different discovery layer access it?
Absolutely, is an open Google sheet. My only request would be, obviously, don't go in and change the work that libraries have done. You’re welcome to peruse what other libraries are doing. We're early in the life of this feature so libraries are just starting to take it up. It is under a Creative Commons license, which means that we don't own it. This then is for you to use and reuse for your methods, and for your purposes, as you're doing this important work.
In the future version, would the remediate terms for other libraries also return the same values? (Example, a Canadian user searches for Indigenous peoples, but their institution is using First nations)
The terms that have been replaced with those also return those values. So, the user would continue to be able to search on those original subjects as well. We would now be supporting searching on the library’s local subjects. But then other libraries’ subjects do not impact your library’s subjects so if I were to search on another library’s Local subject and they implemented a search expansion that wouldn't affect the way that my library's discovery layer behaves.
So as LC changes the official subject heading, we would need to go do maintenance on our list. (i.e., if LC changes the term "Homeless persons" we would need to go change any of the work we did for that subject heading?
With the way that you define replacements, if you've done kind of the 1 of those 1 to 1 replacements that I was talking about earlier where it's replacing a specific subject heading, if that subject heading in the file changes – like, if it makes a change to what the accepted subject heading is, and then that's changed in the cataloging record that match would no longer be valid. The hope would be that that change would be to a more culturally appropriate subject heading. But in that case, there would be some updates that need to be made to your list if you still wanted to replace that subject.
So, the original underlying heading would still be searchable after the future planned work you mentioned?
Yes. The original underlying heading is still searchable. That's been an important part of this feature is we don't want to change the way that users get to the content that they need, and we don't want to make it harder on them. We're just trying to make it a more inclusive experience. And they give the library some of the power too so that they can then define those inclusive headings. And then users just have some new methods to discover content.
If you know, how many libraries have started using this feature? And what types and sizes of libraries are using this feature? Are there any stories available about how libraries are using this?
I think it's a couple dozen but of course we want more to use this than we have. Most of them work for Discovery customers or small, medium, academic libraries so that would be most people using this feature. There is a feature only available to discovery users if you're looking for a story about how libraries are using this, but we don't currently have anything that's happened outside of the WorldCat Discovery Community Centers.

https://www.oclc.org/community/disco...ct2022.en.html

July 2023: Debiasing Dewey: Righting the past by rewriting the classification

July 11, 2023

What has been your most interesting or favorite topic to work on, in terms of revisions in Dewey?
(Kathryn) For me, it goes back to drag, just because it was a new topic and really trying to find where that best fit required a lot of thinking and cross-referencing and looking up all the wonderful research. Also knowing that I was creating something that people needed was really what kept me going.

(Alex) I like relatively straightforward areas like geographic areas, like when a country redoes its borders internally or something, and there’s just neat, clear differences between this one and this one and we can just arrange accordingly. It’s good to have a challenge too, like figuring out that place for crossdressing, especially  those interdisciplinary works, to get it out of sexual practices. Thinking about the aboutness, “what is this? It’s just clothes. It’s how people wear clothes that expresses culture, customs, things like that.” And we realized there was space to put it in a way that is not judgmental, that’s just saying what is going on here and classify it accordingly.
LCSH headings – Can you address when to use Dating (social customs) versus Interpersonal Relations? Do you only use the latter if the genre is queer fiction, or specifically related to LGBTQia+?
We do have both of those LCSH mapped in Dewey, so we have given some thought of how to treat those. I see “Dating” as a sort of narrower term. In terms of Dewey, it’s not necessarily down in the same hierarchy. We’ve mapped “Interpersonal relations” to 302, which is Social Interaction, Social Psychology, things like that. I see a term like “Interpersonal relations” and that can also be bad stuff, you know, conflict, how people get along versus dating which is sort of a specific iteration of that. For queer fiction, in fiction we would treat them differently. We have things in Table 3, the building table for fiction and related topics, which can express things like queer themes. If you had that, you would probably want to emphasize that rather than something broader like “Interpersonal relations” or “Dating.”
Is the best place to find Dewey changes on the blog? I find the big list in the WebDewey confusing.
The Dewey blog is a good place, although I will be the first to admit I neglect to update it when I should. Some of that will pick up after this DDC meeting that’s about to happen, when changes from that meeting get approved. On the WebDewey list, which gives much more granular changes, you can customize that to only look at certain ranges, if you are subject-specific. You can set different date ranges or changes since you last logged in. If you ever find it’s too much of a firehose, you can tweak that to be more relevant to your needs.
I still use my 4-volume DDC 23 set of books - will there ever be a printed 24th edition?
No, at this point, we don’t have any plans to have a 24th edition, period. And we don’t plan have any discrete printed editions the way we used to. We know some people prefer print. Even I find some things easier to parse in print. That’s why in 2018, we started the annual print on demand. We don’t expect anyone to buy it every single year, just that if you want something that’s more up to date in print, the four-volume 23 is 12 years old at this point, that might be something you want to look into. They’re very similar in utility to those old printed editions. There’s a little less front matter, but you get the schedules and the table and the manual. Compared to WebDewey, you don’t get some of those built numbers because we still have to fit it into four volumes, and some of them are very large. But that is what I would recommend if you’re still a fan of Dewey in print.
Can you give a little clarification on the Dewey numbers for COVID-19: incidence vs. medicine vs. social services?
This is a good example of how lots of diseases and conditions get treated in general. We tend to give the social services number as the interdisciplinary number, if something is looking at something beyond just the medical standpoint, we don’t just say “oh it’s a disease, you have to be over in medicine.” If you’re talking about social impacts or how to roll out vaccines efficiently, that’s why we use 360s for the interdisciplinary works. The COVID-19 medicine number is more like comprehensive works within medicine, it’s really looking at how the disease works, how it effects organisms and so on. The incidents one, I believe is also in the medical area but with focus on the epidemiology, mapping out how COVID spreads and things like that. If you’re in doubt, I would tend to say use the social services number, unless this is clearly medicine as the discipline.
In order to get people more involved in contributing to discussions about DDC numbers, maybe there can be a mechanism (e.g. public comments window in WebDewey) for building numbers routine?
I try to think of different ways to solicit feedback and meet people where they are. We do have a discussion forum at oc.lc/deweycontributors that people can comment on and discuss things among themselves. The WebDewey has a comments feature, that you can set comments to be individual, institutional, or public comments, for annotations, local practice, topics that seem to come up a lot. Public comments can be seen by anyone, but the downside to that is that it’s a bit like a wiki, which means anyone can remove them. For now, the discussion forum is something I would recommend. I would be interested to hear if people would be interested in a listserv or something like that. Very much open to suggestions for ways to comment.
How would you address variations of Dating that scope outside of the binary spectrum, like polyamorous couples and others customs relating dating in and out of the LGBTQIA+ people?
This is a topic a little bit like drag, that’s not necessarily LGBTQ+ but does have a lot of intersections with those communities. My guess would be that those topics, like polyamory, would probably be in the boat that drag or crossdressing is in now. It’s not an area we’ve really looked at so far but is something I would like to address. This is a good place for me to give a shout out to Homosaurus, which is a very good LGBTQ vocabulary that is very actively maintained with a very good volunteer editorial board. That is a thesaurus, it’s a vocabulary, so there’s some hierarchical relationships and is something we can certainly take a cue from.
With the editorial process for Dewey, what kind of time frame is there from when you start putting together an exhibit, finish an exhibit, and then have it go to the editorial policy meeting?
The basic schedule that the editorial policy committee seems to usually follow is annual meetings around this time, the northern hemisphere summer, with one or more asynchronous meetings that take place over their listserv. We had that in place pre-COVID, so that was a convenient thing to adapt to. We haven’t had the annual meeting since COVID yet, but I’m hoping to get back next year. It kind of depends when something comes up and what, how sweeping the changes would be. Those asynchronous meetings are good for things that are more than just tweaking a note here or there, or one we don’t really expect to need a whole lot of discussion. Sometimes it’s just a matter of timing. Again, it’s kind of like “how long does it take you to catalog one item?” The first step is usually a triage of how big are these changes, and then sort of planning from there. Do you wait for a specific timeline? Do you break it up and get something now and look toward the bigger stuff? It will depend a lot on the area and how broad the changes are.
Kathryn, you've been Dewey editor in residence now for six months or so. Would you recommend this experience to someone else? What about it has appealed to you and what will you take away from it?
I came into the job knowing that I wanted to make a difference – into librarianship as a whole – and in talking with these young people who have been in and out of my house during their high school years and some during their college years, and just seeing this kind of relief come across them when I talk to them about what we're doing. You can really see the impact of what we're doing. That is what I'll take with me more than anything else. On the technical side of things, I came into the job knowing the basics of Dewey but not really knowing how the process works. And the amount of knowledge from our volunteers, from Alex, everything else is just outstanding. That's my big technical takeaway of everything. I'm still certainly not an expert, nor would I consider myself one even if I had been doing this for years. It's a great appreciation I have acquired for the system. I would absolutely recommend it. I think this program is a great teaching tool. I think it also is a good opportunity for new ideas, new ways of seeing things. I'm going to miss my time doing all the research; overall, it's been great.

July 20, 2023

What has been your most interesting or favorite topic to work on in terms of revisions in Dewey?
(Alex) Climate change is a big one. Before I joined the editorial team fulltime, back in library school, I had classified a lot of works for a local environmentally-focused nonprofit. I was seeing things all over the place, things like “global warming” instead of “climate change,” and I had the idea of “I really want to do something about this.” It was tempting to try to pull a lot of things together and have somewhat more disruptive changes. In my research, I found out that Chinese classification does, they have sort of a dedicated “environmental studies” area, but ours is more interdisciplinary. Figuring out how to update things without being too disruptive, it’s been a good challenge. For me, it’s something I’ve wanted to do for a long time.

(Kathryn) I feel my most impactful work is the gender identity disorders being nixed and redone as the gender dysphoria, which is more accurate. When talking to the community about my work, specifically about the gender dysphoria, you can see the shoulders relax and open, you can see “oh, okay,” and that’s what keeps me going. That has been the most interesting, not just in the research but in the results.
What has the group considered or done about mis-/disinformation efforts that target marginalized groups? For example, books about transition that discuss the false concept of "rapid onset gender dysphoria" or that conflate critical race theory with a form of indoctrination?
I know that this is something the cataloging community has wrangled with in the last few years, how do we classify some of these works that are essentially works of misinformation, disinformation. The default assumption with Dewey is you use the perspective of the author. So for example, if you have a work about agriculture that’s claiming “it’s all aliens,” like “fertilizer is secretly given to us by aliens,” the standard answer would be “well, it’s agriculture, that’s what they say it’s about.” Or controversial literature, conspiracy theories, usually we say if you have a conspiracy theory on a specific subject, put it with the subject; but I certainly wouldn’t blame anyone, if you’re thinking about where it’s going to go on the shelves, here’s a good case for doing that. When Kathryn started work on gender dysphoria and the idea of these medicalized approaches to LGBQ topics, one of the numbers we looked at, there’s a medicine number that’s “homosexuality treated as a medical disorder.” And it has its own note in the manual that goes to great lengths to basically say “only use this number if that’s really what the author is presenting it as, prefer a different number if you can.” And I felt like that was an artifact of the past. But we did find in some cases it’s not really a bad thing if some of those works get quarantined, as it were, in another number. If any of you are SAC numbers, ALA Subject Analysis Committee, you might be familiar with the discussion going on about the idea of Holocaust denial literature and efforts that have been proposed by the World Jewish Congress and other groups to get those works out of regular Holocaust stuff. I wish there was a better answer, but there’s going to be competing priorities. We try our best to give instructions but it’s going to be a case where local needs are taken into account.
Is there a place to find all the number relocations/discontinuations? I've been working on a Dewey guide (for local practices that may vary from official DDC guidance) to instruct our copy catalogers which topics may be classified in numbers that differ than what is in the current DDC. For example, we have many video games books still under 793.932 which is now discontinued, and the current number is 794.85.
You may remember the old tables of discontinuation that had to be printed in all the new print editions. But we don’t do print editions anymore, except for print on demand, so where do you find tables of such changes? If you use the update feature in WebDewey, by default it just shows you all changes from a specific time period, but you can search there for relocations, discontinuations, and continuations. You can also do those searches in conjunction with other parameters, so look at changes from a certain period of time, or limited to a certain number range. For specific numbers, we’re now very good about giving those history notes. We used to only give those history notes when the change happened and when we were preparing the next print edition because there wasn’t space. Fortunately space is not a problem with WebDewey, so if you look at numbers like those video game ones, almost anything that’s been the subject of a kind of recent project, you’re going to find those.
Kathryn, you said you’ve been the Dewey editor in residence since February, so what could you say about your experience and what advice might you have for anybody else contemplating such an internship?
So my internship is coming to an end next week. What I came into this with was just kind of a broad understanding of Dewey. I didn’t know how to build complicated numbers and such like that, since the majority of my education in library school was about LCSH. Coming here and learning about the process of how things change and how it moves forward – the amount of knowledge from Alex and our other volunteers is just vast and deep. What I have learned I will certainly be able to take with me. I also find myself talking to people that I don’t know about Dewey, and they’re like “oh, I didn’t know it did that,” and it does! Really just kind of realizing and understanding the DDC has been invaluable. And for that I am very grateful for my time. 
Is there a particular way individual librarians should contact DDC to request/suggest changes?
If you can reach us, there’s really no wrong way unless you like, write a blog and don’t tell us and we don’t hear it. I just recommend dewey@oclc.org, since from there we can discuss whatever you have to suggest and decide how to proceed.

June 2023: Cataloging Rare Materials Defensively

June 6, 2023

Is there a presentation on Archival Materials at the collection level? Or resources you would recommend? No one at my institution makes bibliographic records for archival collections, so I need resources and a mentor/proof reader!
Kate will be giving a VAOH presentation later this year in November on this topic, but it will not cover how to catalog these materials. You are welcome to send cataloging questions to askqc@oclc.org.

Attendee comment: I found this presentation on cataloging Archival Materials by Elliot Williams at the 2020 OLAC virtual conference to be very helpful when cataloging archival collections.
When searching for copy for early imprints, I find an overwhelming number of records so brief I have no idea if they match my item in hand. If I want to end up with an OCLC bibliographic record with full information for our manifestation and link our digitized version of resource, do I err on the side of creating a new record?
It depends on the record. Check the encoding level (refer to Bibliographic Formats and Standards for more information and descriptions on the encoding levels). Feel free to upgrade those records with lower encoding levels, such as encoding level 7 or 3. Use your best judgment. If the bibliographic record is so brief that you really can't tell if it is different or not, go ahead and make it your own record. If there is information that contradicts the item you have, possibly as a result of a different manifestation, then it is better to create a new record.
Regarding the brief record issue, I wonder if the ILS of the cataloging agency might have a fuller record than what's in WorldCat?
Yes, it could. It doesn't hurt to do that kind of research, if you want to, but might not be possible for everyone. This is not something that we (OCLC) have a handle on here. It's up to individual libraries and what they are contributing or not contributing.
I have come across many WorldCat bibliographic records of printed materials where specific institutions have noted copy-specific information or that the item belongs to an archival collection. Should such local information be retained in the bibliographic record, or should they be stripped out, or should such information exist in a separate record?
When institutions have correctly noted copy-specific information, then it should be retained in the WorldCat record. If you are downloading a copy of the record in your local catalog, you should make whatever adjustment you feel is appropriate, and you can remove this information and add your own copy specific information. Copy-specific information in a WorldCat bibliographic record is not an indication that you need to create a separate record.
What if you use OCLC WorldShare and cannot remove the copy-specific information?
The short answer is, yes. We display a number of 5xx notes fields in WorldCat Discovery, all of which you can find documented here. In terms of $5 displays, we recently implemented a new field for Display in WorldCat Discovery, the MARC 585 Exhibition Note. With this, we have also implemented display of the $5 to indicate which library the local copy note applies to. We also display the MARC 561 Ownership and Custodial History Note, and the display includes $5 information when it is available.
Is OCLC investigating using ChatGPT in its work, such as the duplicate detection algorithm, or other similar projects?
No, we are not investigating using ChatGPT with the duplicate detection algorithm; however, we are using some machine learning. In April, Nathan Putnam and Laura Ramsey did a VAOH session on the topic for which you can go back and revisit. We are in the process of applying some machine learning to duplicate detection and will be starting to do some merges based on the machine learning that we've been developing over the last 6 months or so.
Does DDR (Duplicate Detection and Resolution) take notes (5xx fields) into account? If the material is after 1829 and the record doesn't have a code like dcrmr in field 040 $e, will a note in 500 be enough for DDR to not merge similar records?
To generalize, yes, DDR does take some 5xx notes into consideration. It is fairly limited in its examination of 5xx fields, but it does look for certain patterns of text and for certain fields in the 500 area to try to compare information. It is difficult to parse through because there are many that are free text.
Some of our old bibliographic records for rare scores now need to be revised because a recent thematic catalog updates publication information. Is it ok for us to revise these records even when other holdings are attached?
Yes.

June 15, 2023

Is there a way to be alerted if one of our records has been merged?
Yes, there is. If you sign up for WorldCat updates in Collection Manager, and records are merged, you will get a new copy of the retained record with the merged record OCN showing in the 019 field.
Can you show the email address to contact when records need to be unmerged?
bibchange@oclc.org
Does DDR (Duplicate Detection and Resolution) look at note fields when merging records?
Yes, DDR does look at note fields, depending on the situation and the bibliographic format of the particular records involved. It looks for particular patterns of text in the 5xx fields, and if it finds certain patterns, it will try to compare between the records. Because many of the 5xx fields tend to be free text, it's hard for an automated system to parse through all of the information in all of the fields.
What was the URL for the Building a National Finding Aid Network project?
https://oc.lc/nafan-research
Is there a way to tell in a bibliographic record when a record was merged; and how far back are you able to unmerge?
If a record has been merged, you will find a 019 field in the retained record with the other bibliographic record(s) that were merged into it. So, that is how you can tell that a record has been merged. Our Journal History tool allows us to view the history and undo both manual and automated merge record transactions going back to roughly somewhere in 2012.
If a record has been merged and later updated, if you need to unmerge, do those later updates remain in one of the unmerged records?
Not automatically, but we do try to pay attention when records have been updated in certain ways to manually recreate those updates, or many of those updates, in the resulting record or records.

May 2023: What's in a name?: Descriptive access points overview

May 9, 2023

For prepub pcc records what edit updates are allowed?
For encoding level 8 (pre-publication, also known as CIP (cataloging in publication)) records, you may edit anything in the record (even if it is a PCC record), except the encoding level itself and the 263 field. Only PCC libraries have the ability to edit encoding level to blank. This is documented in section 5.2.2., Editing capabilities for non-PCC records, in Bibliographic Formats and Standards. (BFAS). 
Why would a 740 be used for a portion of the title, that is, a title in the 245 subfield b rather than in a 246?
740 used to be used more widely, but this practice no longer conforms to current practice. See slide 40 of this presentation for an example, where a 245 has no collective title for a manifestation that is a publication of two works by Jane Austen. In this case, the 245 is used to record the title proper, "Pride and Prejudice" with "Emma" recorded in subfield b. Additionally, "Emma" is also recorded in the 740.

Attendee comment: Field 246 is used only for alternative titles that refer to entire resources, whereas field 740 is used for alternative titles that refer to only part of a resource, such as analytic titles from within an anthology.
Is there a plan to make the 856 links in Worldshare Record Manager clickable? Right now, I have to copy and paste to look at a url.
Attendee comment: There is a "Display Web Page" button if you right click the 856 field and go to the list of Field.
Is the relator code subfield(s) in a 700 for example - $e and $4 - are those the same? And if so, is the $4 being phased out? The $e seems to be used more prominently.
They contain similar data in similar form, but neither is being phased out. It is up to you to use either (or both). Relator codes come from a MARC code list.
In cases where there does not appear to be a corresponding authority file for a name, but we want to be able to include it in a 100, 110, etc., is it ok to include? I see records where that's been done, and have done that a few times myself to be able to record things in a way that feels accurate.
If there is no authority record, then formulate the name according to the cataloging instructions you are using (such as RDA) and insert it into the bibliographic record. If you are not at a NACO institution and would like an authority record created, send the request to authfile@oclc.org and it will be created for you.
We recently did a re-indexing of our database and the fact that author-title 710 and 711 headings could have an n subfield in both the author portion and the title portion of the heading was problematic. Fortunately, only a small number of records were affected. Is this an issue that should be addressed by the ILS or is it worth addressing it in MARC?
This issue should be addressed by the ILS.
I thought the 740 could also be used for titles within the work, even when there is a collective title. I have seen this used often for anthologies or sound recordings that have a title for the compiled works (collective title) but we still want access to the titles for the individual works. Is this not how 740 should be used?
What is typically recorded in the 505 are titles of contents as reflected in the manifestation, which may not be be the preferred title of the work. It is acceptable to use the 740 to record the title of one of the titles in the 505. This is often not done in today's cataloging, but it is not incorrect to do so.
There's an archivist that I do music cataloging for. She wants song titles in the 505 also in 740 fields. You said earlier that this is an older practice. Is still ok?
Yes, this is ok.
Many records for titles that are part of a series have the series title in 245 $a and the specific title in $b or $p. Is this a best practice now?
Assuming that the monograph can stand alone, it is preferable to use a 490 and 8xx for the series, then to have the monograph part title as the 245 without series title or numbering.
How do we suggest enhancements to OCLC products?
Use the OCLC Community Center and select Enhancements or send a note to support@oclc.org.
When constructing a new series authority, is a there a place to see the best way to do that? Is it here?
If you are a NACO library, you would need to go through series training in order to create series authority records. If you are not a NACO library, you should construct the heading according to the instructions in RDA.
Link to NACO training documentation
If the only available record has a series title in 245 $a, is a new record justified with the monograph title in 245 $a?
You should update the existing record and move the series title to a 490.
Relator codes/terms ($e/$4) in the middle of name-title entries (i.e. between the $a and the $t) cause problems in our ILS; are these allowed/valid in Connexion ?
Currently, the controlling works by placing a $e or a $4 between the name and the title portion of a name-title heading. If you have a name title, it is recommended to leave out the $e and $4.
When you update the BFM, will the documentation here be updated as well?
This link redirects to the Librarian's Toolbox, which includes the current version of BFAS. If there is something specifically which you think should be updated, please reach out to askqc@oclc.org.

May 18, 2023

Are the rejected forms of a subject authority record used in WC Discovery? For example, if a user searches with a rejected term, will the search tool return search results containing the accepted term without the user being aware of it?
According to WorldCat Discovery > Expand Search With, WorldCat Discovery allows you to Enable searching across authority files to find relevant search terms, including related, variant, or deprecated search terms.

When Expand Search with Related Terms is applied to a search query and the search term matches on the 1xx and 4xx fields of an authority record, additional terms are added from any authority source that your library has configured, regardless of the language determined for the authority source, the user interface, or the user's query.
Are there plans to allow for a cataloguer to append a relationship designator to an already controlled field? Right now we have to uncontrol- append-control.
There are currently no plans for this. The controlling algorithms are developed to match on entire text strings, and the subfield coding helps to ensure accuracy in skip-and-append or skip-and-remove scenarios. Note that you can add or change $e and $4 on a controlled heading without un-controlling in Record Manager, but not in Connexion.
Will Record Manager control 3XX fields that aren't controllable in Connexion?
At this time, this is still under investigation.
Are there plans to add the ability to use macros and text strings in Record Manager?
No plans at this time. Many tools are built into Record Manager, but macros will not be since it is a web application.
Is there a plan to make OCLC Connexion Client a legacy database, too?
Connexion client is an application. The database is WorldCat. Both are planned to be continued indefinitely at this point.
(Qualifying the question above) The same as the Connexion Browser is what I meant.
Connexion browser will be going away, Connexion Client will continue to be supported.
Any plans to control terms in the 382 (Medium of performance)?
This has been considered, but no plans are in place at this time.

April 2023: Data and algorithms and bibs, oh my!

April 11, 2023

You indicate machine processing. Are these workflows using AI? If not, are you looking into what AI could do to improve these types of processes in the future?
We are using Machine Learning, not AI. We have done experiments with AI, to try to match “is this title the same as that title?” Because those can have semantic meaning, that’s getting more into AI, not just Machine Learning. Though we’re experimenting with AI, this first model is just Machine Learning.
Interested to learn more about the process and software used (scikit-learn?): is there more information about this?
We’re not using SciKit; we’re working on Databricks. We have a copy of the database in Snowflake, this is all in the cloud for us. And the first model is a gradient-boosted model, so it takes a bunch of different models and then pulls the results together to get an answer. Working with Nathan’s team, we’re trying to get the precision, the 95% accuracy is very high, so our current model has got fairly low recall, but the accuracy is high. So when it says it’s a duplicate, it’s a duplicate: it’s very certain about it. It misses quite a few because of that need for certainty, so we’re working on our second generation of models now, and it’s looking at totally different types of data. It’s probably using SciKit but we’re finding that does not scale well, so we’re trying to find new ways to get the same results in something that scales better. We mostly just work with Spark and whatever you get done in Spark. SciKit generally requires pandas and the pandas just wasn’t scaling for us.
How good has the machine learning been at detecting duplicates in provider-specific records to merge them with provider-neutral records?
So we have three different groups of duplicates that we talked about. So the third group we talked about is using the data—the data that we’re getting out of the labeling tool will answer that question, so actually we don’t really know yet, unless the provider-specific records kind of fall into that “these are exact duplicates,” I doubt they would fall into the vendor category. So we’re just in the very beginning of starting to look at some of the data out of the labeling tool, so don’t really have enough information yet to answer that.

We are still learning how to do this, so we have a lot of guardrails in place, and we’re looking at the exact questions that you’re asking.
Have you identified factors that lead to errors in de-duplication?
Crap data doesn’t help. But obviously the more complete a record is, the more accurate the cataloging is, the easier it is to tell we’ve got two duplicates. We found so many surprising things when we sent the reviews to Laura’s team. That’s where some of the—Laura talked about those virtually-identical records, those came out early in our testing. Like, oh, these two records are identical down to the punctuation and everything—they just have different OCLC numbers, so, you know, there’s several million of those we’ve managed to identify very early on. Lots of data review, lots of analysis, lots of putting it into piles trying to figure out “Why are these alike? Why are these different?”

This process is helping us update our rules-based algorithm. Everything was exactly the same, so we were able to investigate why that happened, what caused there to be two identical records show up in the system, since that shouldn’t happen. So there is an added benefit of being able to improve our rules-based processes as well.
Don’t we want to keep the provider-specific and provider-neutral records separate and not merge them?
We do, in our normal algorithms, we do merge provider-neutral records. Also, in this project, we are only identifying potential duplicates. It’s not merging them; it’s handing them off to the normal DQE team and they go through the normal merging processes and rules once we’ve identified the records. All of the retention hierarchy, all of that is still in place. This doesn’t do any merging; it is all about labeling the two records as to whether they are duplicates or not so we can give the information to the Machine Learning process.
Is there any work being done to push back to keep duplicates out in the first place?
We know our model is only reaching a portion of the duplicates in WorldCat. I think, when the solution is broader and more complete, we can be able to push it up front and prevent them from coming in, but now, we could do a lot of work using this stuff up front and it’s only going to prevent a portion of the duplicates. So it’s better to be cleaning up the post-facto, until we have a more complete solution. Our goal is, we need to clean up the existing records, but we do want to prevent them from ever getting in. And we do try to prevent as much as we can. As we mature in this sort of thinking, it will enhance the entire life cycle of WorldCat records, trying to prevent as much as possible before we have to do the cleanup.
Is it at all possible to get an example of what constitutes a duplicate record, especially across numerous cataloguing language sources?
If it has a different language of cataloging, it’s not a duplicate, unless one of them is wrong. The screen that Laura had with the Labeling Tool was a good example of what looks like a duplicate. These are potential duplicates identified by the labeling tool:

     247238126 / 883845072

     299743058 / 300035362
Will there be a Part 2 to update after the run of learning?
This is an ongoing project and this is something that we’ll continue to iterate. We’ll continue to do this, not just a Part 2, but 3 and 4, moving from cleanup to pre-loading, all of that stuff. We expect this to be continuous.
What exactly the Labelling Project is supposed to do?
We are collecting examples of duplicate records. It’s really easy to find examples of duplicate records, from DDR or from Laura’s team, Member Merge Partners, but it was really hard to find examples of two records that are potentially similar but aren’t duplicate. Finding the “no’s” was really hard for us, and we needed those to train the model to become accurate. So the labeling tool is presenting pairs of records to users for them to say “yes, it’s a duplicate,” “no, it’s not a duplicate,” so that we have good, labeled data to train our model.
Where do I find information on participating in the labeling tool?
So one of the links that we’ll supply in addition to the labeling tool is an FAQ that explains basically everything you’re responsible to do or that you’re being asked to do. Again, you’re not actually merging, there’s nothing that can be messed up from this, so it’s really just saying “is this pair a duplicate or not?” and you can choose why, you can choose the specific fields.
Will we be trained beforehand if we join the project?
See above, but if this is a question about joining the Member Merge, once you send us a request to askqc@oclc.org, we will put you in the queue and we will pair you up with a person to work on the format that you’re interested in. And once you finish your training, you will become independent to be able to merge on your own. Yes, it is a longer process than the labeling tool.
I don't have a NACO background. How do I learn more about this?
You can learn more about the NACO program at https://www.loc.gov/aba/pcc/naco. If you are interested in joining that, that is a separate process run by the Program for Cooperative Cataloging (PCC) and the Library of Congress.
I requested some record deletions a few weeks ago and am still waiting. Seems hit-and-miss lately. Any advice? (Patience, more email nudges to bibchange, other?)
We’ll look to see if that’s in someone’s workflow right now. Just FYI, we have a new tool that we’ve been using to do deletions quicker, and it was just exposed to us in the past few weeks, so we’re still kind of working out some bugs with that. So if someone does have your request, that might be why it hasn’t been completed yet. But now those bugs have been worked out and we are able to delete records in a much bigger quantity. We will check if that’s in someone’s workflow, and if not, we’ll ask that you send it again.
I have seen some German records (cataloging language is German) with a LCCN labeled as DLC records. Is it right?
No, it’s not, actually, and we’ve been having conversations about updating our code. This is being displayed in Connexion, and the code that was established for that display that you see, for DLC, NLC, NLM, for example, was established quite some time ago and needs to be updated. So we have been in discussions with the data connection team about making that correct, and you might see some other changes as well. We’re taking a look at everything that we display with a certain label like that. So that will hopefully be fixed in the near future.
How is the project to consolidate encoding levels going?
We’re still working on converting encoding level “I” records. I don’t have the numbers off the top of my head for what we have left, but we are looking at some alternative ways to make changes to the records to speed up our conversions, because once we finish encoding level “I,” we have encoding level “M,” so we are exploring some faster, quicker, more efficient ways to make changes to those records, and we don’t really have an estimate on completion of that. Unfortunately.
For those who had the training, do we know when we will be able do merges directly in WMS and not in Connexion?
I don’t know of a time frame for that, when it will be available, but it is in testing right now. On the Record Manager public roadmap, which is in the Record Manager Community Center, that should let you know which quarter we expect that to be released in. We know it’s coming and sooner rather than later.

April 20, 2023

When you say you allow for some differences in publisher info—do you mean subsidiary publishers?
When we manually merge records that are vendor records, we allow for a lot of differences. Meaning, we see that those points match that we want to match, but when it comes to matching publishers, we may not have that match, and that’s usually okay. We may not have an exact title match, and that may be okay. We may not have exact extent match, and that may be okay, just due to the fact that they’re vendor records and we know that that information is not necessarily reliable.
In the format graph for the labelling project, does the "books" category cover both print and electronic versions? I saw audiobooks, I think, but I didn't see a distinction between the above; maybe I missed it...
In our first version of the model, we are looking at books, and electronic books do fall into that. So, my understanding is that “type of item” in our WorldCat tables is broken down by standard record types one and two. The chart you saw with the green and black bars, that is a breakdown by standard record Type 1. So that includes book; and then print book and electronic book are different, standard record Type 2 values within the standard record Type 1 umbrella of book. Audiobooks I believe are a different standard record Type 1. So they would not be included in our first model and they might have different information, different kind of challenges in terms of parsing a record. For example, if I’m looking at a print book or even an ebook, I’m not having to parse out who was the author vs. the narrator who is reading the book, and I’m not having to parse out say, how long is this recording? And maybe trying to compare that to a page number in another record. So it does kind of simplify it to just focus on print books and ebooks.
Does the project have any special provisions for special collections/rare book cataloging? For example, are there differences in handling for materials published before 1830 vs. after 1830?
If it’s from before 1830, we don’t touch it. That is just more complex. There may be a time when we circle back to that, but at this stage, we’re not even touching it if it’s before 1830 just because the cataloging is so much more subtle there.

We are new to this as well so we are approaching it cautiously so we don’t create a bunch of merges that we wouldn’t do normally. We know that each format has its unique set of circumstances and we want to review it for those circumstances. We’re not going to just dive into it but approach it cautiously and be sure we’re making the right decisions.
Why 1830? Is that the cut date for rare book special collections?
Rare books can come from any era, but books published before 1830 are more likely to have been issued in variant states, etc., that are of interest to scholars and special collections librarians, so they generally require more detailed description of pagination, signatures, etc.

We did previously cut it off at 1801, and we bumped it up based on feedback from the rare books cataloging community, so it is now anything prior to 1830.
Are brief bib records (many vendor-created) also included in these efforts or will it be in the future? Many journals/serials fall into this category and some have a ISSN in $y when it is actually the $a, as one example.
I would say brief records are not specifically excluded, they will be considered, but unfortunately since they may have not too many match points, not too many fields that give strong evidence to the model that they should merge, there’s a risk that duplicates would be overlooked by the model that we’re creating if they’re very sparse or potentially badly coded. We also are thinking of having a separate process for specifically grabbing those vendor records, maybe being more permissive in terms of vendor records is going to make more sense and allow us to clean up more records than the general purpose model might be too cautious to look at.

We also haven’t started tackling serials yet. We’re starting with books. The hope is to branch out, do some experimentation, figure out what’s going to be the next productive direction. Looking at other standard record types is on the table. The biggest potential holdup in terms of getting those addressed is the minimal labeled data for non-books.
Can you audit the rules Machine Learning created and used to make a particular decision or is it more like a black box?
We can’t get like a printout of “here are the exact rules.” We can get some insights, for example, the algorithm at the core of our current model is called gradient boosting, and it is possible when you train that algorithm to get a ranking of which features is the model using to help make its decisions. We can compare that to the features the human reviewers think of as more important—for example, from the labeling tool, title, author, publisher—and is that also true for our algorithm? We do a lot of analysis for if our model is getting some of our labeled data. Can we look for patterns and identify what kinds of examples seem to trip it up? Why is that happening? Auditing comes down to what is the accuracy rate of the model’s predictions, vs. being able to fully map what it’s doing internally.
Does the model include a suggestion which record should be preserved? Is this something that would fit into this process?
The data science team does not touch that. That is for the data quality experts. It’s a separate process from this. What will happen is we’ll take the duplicates that are being identified from the model and we’ll feed them to our regular resolution process that DDR does today, so the same rules will apply as far as what record will be retained, what transfers, and all of that stuff that goes into the everyday processing. That will apply to those records.
How does (or does) this approach deal with provider-neutral records?
It’s not really handling provider-neutral records differently, it’s still going to compare the description of the two records and apply the same process of determining if they are duplicates or not.
Does the model work well in detecting duplicate non-Latin scripts records?
That’s something that we have put some thought into. Some of our model features, when we write code that looks at two records and asks “do they have the same title?” if both records do have non-Latin script, we’ll attempt to compare that script data whenever possible. We try to use script data when it’s available, but we have not yet done detailed stats on records with or without 880s and the model’s accuracy rates. We didn’t see any difference in error rates between records in different languages, but this is an area to do more research.
I recently came across two separate instances of a same work that were reprinted by the same publisher in the same year. The records of these instances are almost exactly with only different 500 reprint notes. Can the machine pick up the difference?
So note fields are tricky. That is definitely an area where we could grow and learn more. The way that we are currently dealing with the general note field is that we tend to zero in on “is there any numerical or quantitative data?” So if different years, that would flag as a difference. But 500 fields that are different are so common that we would be very limited in how many duplicates we could detect. In general, we are able to flag many differences that are only in 500 fields, but not every single one.
Is there any way to deal with wrong item formats assigned to audio books - they are often cataloged as ebooks—but they are actually sound recordings. I have found this issue both in OCLC and WorldShare Manager.
That’s one way we can leverage the data the team is helping us identify. Rather than trying to figure out if those kinds of records are duplicates, we would take that data and actually make corrections to those records. It’s kind of another piece of this. Rather than trying to work with incorrect information, it would be better to correct it.

As we’re able to fix incorrect data, that can allow for more correct merges, but we wouldn’t use the deduplication as a way to fix the data. And if you do run into items that are on the wrong formats, please send them into bibchange@oclc.org and we can look at those for you, and we appreciate you letting us know about them.
Is there a limit for the numbers of records processed by the model each time?
I would say that number of records processed or looked at is not a limiting factor for us right now. The model runs, from start to finish, batch process on all our candidate pairs of potential duplicates. The MinHash process generates many million pairs of potential duplicates and the model checks them all and says “yes, that’s a duplicate,” “no, that’s not,” and the actual model prediction runs at about an hour on our Databricks environment.

March 2023: Languages, Non-Latin Scripts and Mysterious MARC 880 fields

March 14, 2023

If the language of cataloging is English, is it legit to paste a summary in French in the 2 in field 520? Another library responded that they have libraries that request Spanish summaries for Spanish materials that are catalogued in English.
Local needs like these are important. Meeting the needs of your patrons is important. Including a summary in a language, that your patrons would want to use, is important and fine to do. To comply with the English language of cataloging is to put in the 520 field ahead of the summary, Spanish summary: French summary, etc. That's not required and we know that there are summaries in other languages of cataloging, but that would be a nice way to indicate that you are complying with the language of cataloging.
Is the field 242 only for English translation?
The required subfield Y is for the language code of the translated title. 242: Translation of Title by Cataloging Agency (R)
Is it possible to use the ISO language codes in subfields other than field 041? In field 242, could you use an ISO language code? Assuming ISO 639-3 , 639-2 instead of the MARC codes?
The 041 field can use those other codes, because you can identify them in $2. In the other areas where language codes exist there's no way to identify them differently. We still use the MARC language codes for those areas, such as the field 242 $y. Also, those ISO codes may also be used in the 377 fields.
There is a question about the punctuation in field 546, your example, showed language $b Latin. The BFAS shows semicolon after the language, and includes alphabet together with the script. Mongolian; ǂb Cyrillic alphabet. Could you clarify? What is the current OCLC standard to structure the field with the subfield b?
From: LC_PCC PS 7.13.2.3 recording scripts.

Use the English language names of scripts founders at: http://www.unicode.org/iso15924/iso15924-en.html
 
"Generally do not include the parenthetical information found in the list when recording the script name. If a resource is in a language that is commonly written in more than one script, name both the language and the script."

Examples are contained at the link above.
Sometimes we try to generate paired fields in Connexion and Record Manager by putting both non-Latin and Romanized titles in as 100 fields, but they don't automatically pair and just become 2 unlinked fields. Is there something that should be done to ensure it always works? Or should it be reported as an error?
You may manually pair these two fields. There are commands in Connexion and Record Manager that allow you to manually pair the fields and when you do that, that's when the brackets show between the two fields, that indicates they are paired. You can also un-pair fields, which is also part of the commands.

Connexion Client: Edit non-Latin records

Record Manager: Link non-Latin fields
Are there any plans to have 880s for subject fields? Specifically talking about the 650 fields. Is it an option in Record Manager or Connexion Client
Yes 880 fields can be paired with almost any field in MARC, any field that has textual and non- coded content. It is possible to do it with any subject headings, 650 or otherwise.
My library uses OCLC Discovery, but field 546 language notes do not display. This is unhelpful for multi-lingual items. I have considered repeating the 546 note in a 500 note, but didn't want to duplicate information in the record. Do you have any other suggestions?
If you are using a local interface, you can edit it for local use.

If you're using WorldCat discovery. The best suggestion is for you to submit this in the community center as an enhancement to the product. This will allow the community to weigh in that this would be useful. Also, that will get the attention of our Discovery product colleagues here at OCLC
Take for example a title is translated from Italian to English, then the English translation is translated to Korean. In this case you would have field 041 with kor $k eng $h ita , but if there is no information about the original Italian title that can be found then, do we add the English translation in a 500 and don't add a 240?
Check the authority file. You would try to find the 240 as established, the name title and authority record for the title. If not, then you may create an authority record or an authorized form the title from the English translation, even if you can't find the title in the original language.

Follow the instructions in RDA 6.2.2.6 - https://original.rdatoolkit.org/lcps...lcps6-149.html

Titles in the Original Language Not Found or Not Applicable.

Apply this instruction when a preferred title in the original language cannot be found either in manifestations embodying the work or in reference sources. This may occur when manifestations embodying the work do not contain titles (e.g., some manuscripts, sculptures, choreographic works)

or

manifestations embodying the work are not available (e.g., no manifestations of the work are known to exist)

or

reference sources do not contain a title for the work in the original language. For such works without titles, choose (in this order of preference):
  1. a) a title found in a reference source in a language preferred by the agency creating the data
  2. b) a title devised by the agency creating the data
If the language of cataloging is English, how about using field 250? It could be used as an edition in English. There was a comment that the 250 should be transcribed from the item.
The 250 field should be transcribed from the item, or from the resource that's being catalog. Unless it is a cataloger supplied addition statement, in which case you would use the language of cataloging.
The 788 field was mentioned but how predominant is the use of the 765 original language entry?
There are approximately 964,000 765 fields in WorldCat. Which is a very tiny percentage of all the records in WorldCat
There's feedback about one of our slides. It says the 546 slide had an example with translation from. It's understood it's not the best practice. But definition of the 546 is content of described material. It should not name a language, which isn't present in the work catalogued.
Slide 28 has been updated.

March 23, 2023

Would it be okay to use the 546 as translated from the French, since it is citing the original language? Also, in the documentation, it seems to say that we have text in English from the French translation.

Why wouldn't 546 be used for the note "Translated from the French" since that is a note about language?
Specific 5XX, notes should only contain information that fits the field definition and the scope for the field. When you go outside of that scope, you should either use the 500, because it is a general note, which can be used for many things, or use two notes if there are specific notes for each piece of metadata.

For example, you could say in a 500 note; “In French and Italian for ages 7 and up” or you could have a 546 “In French and Italian”, and then have a 521 “For 7 and up”, the preference is to break up break it up into two specific notes when you can. However, there is not a specific 5XX note that would be appropriate to record information about the language of the original, so then you would use the 500 note.
Comments from chat:

Australian records frequently contained field 242

CONSER standard record practice is to transcribe the parallel title in 246 instead of the 245 field.
There are variations on standard practices.
What is the purpose of recording the other titles in 246 when they are already listed in field 245?
This is for discovery purposes. It allows you to browse on the title in addition to being able to search for it.
Should we record a parallel title if it appears on the source that the document itself is not in this language? For instance, the title appears in French and English on the title page, but the document is only in French. This is frequent with standards like ISO standards.
Yes, if this is appearing on the chief source of information from where you are transcribing the information that you are putting in your bibliographic record, then you would record the parallel titles, even if the resource is only in one of the languages.

It might be a good idea to include a note that explains the situation that it's that they although there's a parallel title the document itself is in French only, for example, just to explain to users in the body of the record.
Do you include the non-Latin script in field 245? We include it in all of our international books.
Yes, when you are transcribing, you would put it in what would look like the 245 field, but behind the scenes, it would really be an 880 field.
There is a question about the VP: search.
VP: is not an abbreviation for anything, so it's definitely not an intuitive search to use. It is what is used to search for ‘Character Sets Present’. It's documented in searching WorldCat indexes. Librarian Toolbox - Searching WorldCat Indexes - Character Sets PresentCharacter Sets Present
Is field 775 commonly used for different editions?
It's hard to say how commonly it is used, there are about 2.4 million records that have a 775 out of 534 million total WorldCat records. It's not clear how often there would be different editions that would need a 755.

Also 775 - Other Edition Entry, can be used to provide linking for a serial simultaneously issued in different languages. It can also be used for a reprint situation. When there is a change in the physical form, such as print and microfilm, then 776 is used.
When batch uploading records with non-Latin script titles, should we upload with 2455 fields? One for the transliterated title, and one for the title and the non-Latin script? It sounds like then it would be linked manually once uploaded in Connexion.
When batch uploading records with non-Latin script titles, should we upload with 2455 fields? One for the transliterated title, and one for the title and the non-Latin script? It sounds like then it would be linked manually once uploaded in Connexion.

When you are cataloging in your local system, and then uploading the record, then you probably want to use 880 rather than 245 for the non-Latin script field. It does depend on the system and how your system is set up, or whatever interface or program you are using for the cataloging and how those would be tagged.
For clarification the question above was related to MarcEdit for batch uploading.
We are not sure how MarcEdit handles non-Latin script, we know it does but we don’t have the specifics.
What does kw: meant in searching?
The index kw: is for keyword searching.
A follow up on the question about the parallel title when the document is only in one language, isn't it misleading to our users to transcribe both parallel titles, in the 245, if the document is in only one language? For discoverability purposes, shouldn't we record the parallel title only in the 246, but not in the 245 subfield b?
If you make a local decision to do it that way, that's fine. If you are transcribing your record and putting it into WorldCat, it would be a very good idea to transcribe exactly what you see on the title page and include all of it. However, if you decide that it would be confusing to your local patrons and want to edit that out or not include that in your local copy of the record, that is fine.

There is an example posted that Chinese publications often have an English parallel title, and nothing else in English. It is very helpful to record that for users and that RDA has no restrictions. Parallel titles have to be in the language of resource.

To clarify, in RDA there is no restriction on a parallel title being taken being taken from the chief resource of information or the same source of information as the title proper. It can come from anywhere in the resource.
There is an issue that's been encountered with multi-volume records for graphic novels being turned into single volume records after they're downloaded.
You are welcome to send those to bibchagne@oclc.org and we can look into the issue.
A question regarding 245 2nd indicator. On this
page, https://www.loc.gov/marc/bibliographic/bd245.html , there is a statement: Diacritical marks or special characters at the beginning of a title field that does not begin with an initial article are not counted as nonfiling characters.

What is the rationale behind this?
They normalize or turn into either a blank or nothing. At the start of the field the a blank gets normalized to nothing. They don't impact searching, whereas if there's an article there, then it's not as clear. When a particle gets normalized the punctuation isn't as likely to get normalized if there's an article involved. If you've got an article with a space and the space is counted as one of the characters in your 2nd indicator count for the article. So ‘the’ is 4, instead of 3, because there's space and so if there were punctuation then it would also be counted as a character, because it's not being normalized at the start of the field.
For clarification an example that was put in chat. 245    00$a[Diary]. If it was still in brackets, but it said [the diary] then the second indicator would be a five correct, because you count the bracket and the space?
Yes, that is correct.
Is the leader position 09 heavily relied or mostly ignored?
It indicates if the record is UTF-8 or MARC-8, and I see you just put that in chat and so our system does rely on it behind the scenes for how the record is exported for use by other systems. It is not critical for how a record is displayed in WorldCat.

In WorldCat its not critical. As a record that's external, either being imported or exported, then it is important.

February 2023: Bibliographic Formats and Standards (BFAS), the early chapters

February 14, 2023

How does one become eligible to edit records in WorldCat?
If your library has an OCLC Cataloging subscription then you just need a Full level authorization for Connexion, and/or the appropriate Cataloging Role for Record Manager.
If you are interested in cataloguing looseleaf publications, would section 3.3.2 Integrating Resources be the best place to look?
That is the best location. We spent a good amount of time revising this chapter to include any RDA specifications as well. We do have some examples in that section that you can use to help guide you. I encourage you to use these guidelines along with the cataloging guidelines for integrating resources.
Just a comment. I really appreciate the extra steps of matching the previous AskQC office Hours with the various chapters. Very helpful!

I've noticed that I encounter many 100 and 700 fields that are controllable and should be controlled but aren't (there is an authority file, and the field is in the correct format). I control them as I encounter them -- is there a reason the fields may not be controlled? It doesn't seem to matter who the inputting agency is or how old the record is.
We are currently making some changes to the Controlling Service so there might be some delay and some of what you describe happening right now.
Most unqualified personal names won't be controlled by automated processes, correct?
Yes, that is that is correct, and it wouldn't need to have to be a qualified name.
Linked data and WorldCat (BFAS)?
OCLC Research: Linked Data Overview OCLC and linked data
Speaking of parallel records (for different languages), no longer use 936 for parallel records, right? 
https://www.oclc.org/bibformats/en/9xx/936.html
Field 788 is the Parallel Description in Another Language of Cataloging and that's the field that should be used.
How long does it take to changes made to an authority record, such as the established form of the name, to be reflected/updated in WorldCat bibliographic records?
Sometimes it's fairly quick. It depends how many WorldCat records have that name. Are they unqualified? Are they already controlled? It just depends. Sometimes there can be thousands of records with the name and that might take a little bit of time for the system work to work through. It also depends on if records are in the process of being edited and are locked and can't be updated at that time. Usually, it doesn't take terribly long. If you see a heading, not getting flipped in the records that you think should be controlled feel free to write to bibchange@oclc.org and we'll gladly look into that for you.
I'm seeing records cataloged in English with personal name subject headings linked via 880 fields to non- Romanized forms of their names (like Plato and Tolstoy). Should this be occurring?
I thought you were allowed to link 600s to non-Romanized forms as long as you used second indicator 4 for the non-Romanized form
When you have, as an example, an English 600 which is linked to say a Chinese vernacular 600 those two are not compatible.
880 fields and 600 fields https://www.oclc.org/bibformats/en/controlsubfields.html#subfield6?

The last example on the 880 field of the BFAS page applies:

                                         askqc-feb2023-1.png

The PCC Guidelines for Creating Bibliographic Records in Multiple Character Sets allow, as an option, supplying non-Latin data in parallel fields for AAPs established in non-standard romanization or in a conventional Latin-script form. This is done more often for some languages than for others.
The information in these BFAS chapters is so rich and helpful. Which section would you recommend that catalogers particularly pay attention to? Do you have any favorite advice?
Perhaps Chapter 4 When to input a new record is a good one to look at. It has all the information on if you should input a new record or whether you shouldn't. And the guidance there is to assist catalogers in not entering duplicate records into WorldCat. So that is that's a really good resource to look at. It also has some specific sections that deal with maybe edition statements or serial records. And so, that has some really great information in it. And Chapter 5 is a good chapter as well. it has a lot of really good information about what you can do in WorldCat. To enhance and edit records and we did, we did quite a revision of chapter 5 recently. Also, the end section also includes information on reporting errors and, when would you submit the air report with proof and when can you just send an error report - that type of this information. So, 4 and 5 are really good.

And Chapter 3 because whenever you're doing something outside the norm or away from what you normally do, chapter 3 is a great place to look for those special items, like, “In” analytics and off prints and local recordings. If you have something come across your desk, I always like to recommend chapter 3 to, to figure that out.
I'm considering our use of punctuation in our original cataloging for our agency-produced items. What would you recommend? No punctuation? These items may likely be digitized sometime in the future. I'm reviewing Ch. 2.8, but still not sure what may be best in this situation. https://www.oclc.org/bibformats/en/onlinecataloging.html#punctuation
The choice of whether to use minimal or regular punctuation is really entirely up to your institution and you may want to document whatever decision you've made and why you've made it just for historical purposes and for the people would come after you, to explain why you did what you did. You may also want to take a look at some of the discussions that happened around punctuation. I believe there are links for the documents, the PCC documents related to punctuation. PCC documents will give you a much fuller idea of the pros and cons of each choice, but it's really entirely up to you.

PCC Guidelines for Minimally Punctuated MARC Bibliographic Records an Overview
There is a library that uses English as language of cataloging nonetheless descriptive elements and subject headings are in Spanish. What is the best way to handle this? I see these records constantly.
It is perfectly okay to have subject headings in Spanish on a record cataloged in English. It is the descriptive elements in another language of cataloging than the language in $b of the 040 that is questionable. Ch 2.6 Language of Cataloging and Policy for parallel records
If you're cataloguing a British Standard (BSI) where would you enter the BSI number to ensure that it is returned in the Standard Numbers index
It might be best to put those in field 024 with a subfield $2 bsi and that is based on the standard identifier source codes: https://www.loc.gov/standards/sourcelist/standard-identifier.html. Then it will be returned via the sn index.
How long does it take for new authority records to get into VIAF?
LC loads to VIAF weekly.
Is there any difference between requesting changes between using "report error" function in Connexion and sending emails?
No, there's no difference. The Ch 5.5 Requesting Changes to Records lists the different options that you have, and they are truly options. So, whatever works better for your workflow, is what we'll take, and we're happy to accommodate you that way. It all goes into the same inbox, the same workflow, and then anything that we need to put into another workflow.
How far are you behind in doing corrections and merging records?
It really depends right now about how far are we in the normal change requests about a week or so a week or so. It'll probably take about a week or so for you to hear back or the change to be made or something along that line. The duplicates workflow depends on the amount of backlog we have and the format, the staff availability to merge as well. And we have great partners in the member merge program who assist us in this regard, and we continue training more all the time. But it really does depend on the requests that we receive. It does depend a lot on quite a few different things.
I clicked on the link that was posted to Control Subfields and saw the $u. Is it necessary, when adding a URI, that it be in a format ending in .htm or .html, in order to work?
What is more important is that the URI had the protocol at the start, the http colon forward slash forward slash (http://), rather than just the www. Meaning, for example https://www.oclc.org rather than {{"www.oclc.org'}}.
Validation should catch it and complain about the subfield not having the correct format if it doesn't have a protocol.
Could you offer some advice about how to proceed when two separate identities for persons are being conflated into one VIAF identity?
Yes, you can report those to us at bibchange@oclc.org.
We have this situation: At the beginning, we have print book bib record, we weeded all print copies and added an 856 41 $71. Is it ok to keep the format as print book even if we do not have any print copy?
Yes, it sounds like you did the correct thing. You indicated that there was an online version, and that it was a different version from what the description of the print record was. So, that's that is what we would we would encourage you to do as far as. Whether you use that record or a record for the online version, that's really up to your local policy on what you'd like to do with that. But it's perfectly okay to create a separate record for the online version, and we would encourage that.
For significant duplications (machine loaded by several institutions) what is the best method of providing the repeated instance from these institutions? I've been reporting them individually, but there's way too many to handle in this fashion.
You're welcome to report them individually but if it is going to be a situation, as you're experiencing, where there's a particular OCLC symbol that seems to be involved with these duplicates, you are welcome to report that. If you could send a couple samples records along with that, and then maybe a description of how they're all connected together, whether it's one or two OCLC symbols that would be helpful for us to do more research and pull them together. But yes, you're welcome to send those as well.

February 23, 2023

Do any of the chapters cover cataloging streaming video? I know there are other sources that do but want to know if BFAS addresses this as well.
Yes, but it would be better to go to other sources, particularly Best practices for streaming media, which has much more detailed instructions and recommendations about dealing with streaming media.

The updated OLAC BP guide, which draws heavily on the OCLC BFAS guidelines for recommendations, should be published within the next couple of months. The current OLAC guidelines: https://cornerstone.lib.mnsu.edu/olac-publications/17/
Ch 3.3.1 Electronic Resources
In order to code a record as "full" encoding level, does it need a call number present? It's not clear to me in Chapter 2.4 Full, Minimal, and Abbreviated-Level Cataloging.
I don't believe so. There are plenty of resources that different institutions choose either not to classify at all or to use their own Local classifications or accession numbers or whatever. The inclusion or the exclusion of a classification number isn't necessarily a disqualification for full level, but it really depends. So, don't consider that.
There are records with local information, e.g., the local collection title, and coded as 4xx/5xx, 8xx with subfield
5. Are they supposed to use 590 and other local fields?
Yes, ideally, that information should be in local data record or local holdings records. If you come across stuff like this in WorldCat by all means, feel free to report that to bibchange@oclc.org so that we can investigate further and see if there are additional records that need to be taken care of or if the institution maybe needs a friendly reminder on where to put local data.

Definitely check out the 3.4.1 section in BFAS that goes into details on the different types of fields that you can use if you're inputting these local fields or locally specific information in records, it's pretty helpful.
Could you give a general idea of when I, as a cataloger, would use BFAS?
It really depends on what you're cataloging. For instance, if you have a special collection or, say, online resources that you're cataloging it might be good to review 3.1.1 Online Resources and see if there's anything you need to keep in mind. Or, if you have questions about something like a specific topic, you might be able to find that topic in the beginning chapters as well as in the specific field. We've done a lot of work over the years to include examples of what the guidelines say. We also have done a lot of work to match up current cataloging and practice and policy with those examples and guidelines in BFAS. So, it really does depend on what you're cataloging at your institution.

Responses from attendees:
  • In my experience, I use it almost every time I catalog. Yes, all day, every day.
  • I use BFAS all the time--double-checking punctuation, seeing if a new field, subfield, etc. has been added. I think this was a great topic to begin 2023's VAOH. From Connexion, I regularly "right-click, MARC Field Help, etc.," but there is so much in these early chapters that I have embarrassingly overlooked. Thank you! {Record Manager also gives MARC Field help that sends directly to BFAS.}
  • I use it frequently too, especially for indicators and how to use subfields.
  • I use BFAS if I'm unsure about indicators or subfields, especially with formats I haven't cataloged in a while. And I've used the "when to create a new record" troubleshooting guide too.
  • BFAS is great! And the updates to it are really appreciated! I use it a lot for reference. Thank you!
One common area seems to me whether or not to use an existing a record or create a new record.
Answer
I really appreciate the updates that have been made to BFAS recently. Are there other areas slated for updates or revisions?
If anyone has anything that they would like to, um, offer up as enhancements, please send a message to askqc@oclc.org. We're always happy to take recommendations for improvements.

Upcoming revisions to BFAS:
How often does your offline controlling run? I am seeing more records in Connexion without controlled headings.
The offline controlling is currently paused and we'll be resuming it soon, but we don't have a timeframe for when that will happen.
 
Stand-alone controlling software details
How do you announce changes to any part of BFAS? Is it to the AUTOCAT type of lists or somewhere else?
Contents of but that doesn't. Whenever we update a page, we include the date the latest update happened for that page. So, on each of the pages, you'll be able to see the last time that page was updated. There is a BFAS Revision History. Because we do so many changes so often we don’t generally go into a lot of detail. It doesn't go into page-by-page detail. The revision history page does talk about larger revisions, you know, when we revamp one of the early chapters, for instance, that's accounted for. Or when we make changes as a result of a MARC update, for instance, we had a new update that was installed just a few weeks ago, and we've completed those changes related to that MARC update. So, there'll be things like that. But we don't go into that much detail about individual pages because it just happens so often.
I have a vague memory that for serial records there was a 10 holdings limit in terms of serial records that could be edited by anyone versus records that required full-level credentials (not authenticated records with fewer than 10 holdings were basically open to all participants to edit). I'm not seeing anything in BFAS about this now.
That was something we did before the Expert Community many years ago. Today edit are not based on holdings but the level of cataloging authorization combined with the Encoding level, etc. of the WorldCat record. See BFAS Ch 5.2 Member Capabilities under 5.2.2 Editing capabilities for non-PCC records:

Continuing resource maintenance:
“With full-level cataloging authorization or higher, you can close-out, link, edit, and/or correct full-level non-CONSER continuing resource records. You cannot enhance or maintain continuing resource records authenticated by CONSER.”
I'm starting to see records mainly UK MARC that are repeating the 300 subfield $a, in the name in the same 300. Is this a known issue?
I believe that was reported to us today and whoever is covering today for AskQC for questions I'm sure will be investigating that and it's something that if we see a pattern, we will definitely reach out to the database analyst who handles the particular project so we could get word back to the institution. And then, of course, we'll try to fix the records.
I've recently starting seeing Fixed Field Srce coded "c" in Germany language serial records. This causes a problem when trying to identify DLC/PCC records in search results
I think it’s the way Source “c” is defined and definitely something that we need to look into. I believe this has come up once in the past and we need to look into that. But in the meantime, you could limit your searches if you're looking just for PCC records limit your searches to English language of cataloging and that should eliminate anything catalogued in German language or other non-English languages.