Welcome to Cataloger in the Loop!
Explorations in the world of AI by a metadata librarian
Welcome to my new Substack! I’m Casey Mullin, a cataloger and academic librarian with a penchant for all things music. Today is the first day of my six-month sabbatical, during which I will be researching the application of generative AI, machine learning, and related technologies to library cataloging/metadata workflows. The core of my learning journey is the construction of a new prototype cataloging app, initially tailored to music formats (my personal wheelhouse), but eventually (I hope) generalizable to all manner of things that libraries collect and describe.
This space will feature a mix of different types of posts: project updates of a more technical nature; deep dives into various aspects of generative AI, including the swiftly evolving state of the art in AI-enabled software development; and philosophical musings about the emerging role of AI in society at large. All of this will be from the vantage point of a librarian with nearly two decades of experience in the field who has, during that time, amassed just enough technical skill to be dangerous, and always anxious to learn more! It is my great privilege to take this time to focus on growing those skills and hopefully bringing my long-held vision to reality: a more ergonomic, accessible and dare I say enjoyable working environment for all library metadata workers.
I’ll go into more detail about my learning journey in a future post, and how AI itself has helped shepherd me along my learning journey to understand what lies under the hood of AI. But first, allow me to introduce you to “ErgoCat”!
What is ErgoCat, and why am I building it?
In short, my research boils down to one question: if I could build a cataloging interface from scratch using the full suite of present-day technologies, what would it look like? ErgoCat, short for “Ergonomic cataloging” (or “I cat[alog] therefore [I am]” if you are Latin-inclined and a fan of double entendres like me!), will attempt to answer that question. More on its design specs below. But first, some background…
The relationship between libraries and technology has always been fraught. At best, we strive to keep up with the technological environments our users live in, and to meet them where they are with an evolving array of information resources and services around those resources. At worst, we can be dismissed as relics of the past that are not integral to the communities we serve. How many times have you heard “normies” dismiss libraries merely as warehouses of dusty old books, or as glorified homeless shelters/daycare centers? To the contrary, libraries were some of the earliest places to provide free internet access back when access to a computer was far from universal, let alone access to a sufficiently fast internet connection. This is not to say that library workers have on the whole embraced emerging technologies uncritically. Quite the opposite; AI skepticism is just as much a part of the current environment as is the AI optimism that drives my project. (More on this in future posts!)
I could go on, but this blog is not about advocacy for libraries at large. My little corner of librarianship offers a slightly different lens.
Libraries rightfully devote most of their precious (and inadequate) resources to providing the best for its users. Where I believe we have fallen down on the job is in how we provide for our own employees. The microcosm of library cataloging departments is an apt case study for this. In the days of card catalogs, catalogers created painstaking descriptions either by hand (using a practiced form of penmanship called “library hand”), or by typewriter, a brittle tool that by design has almost no margin for error. Later, rudimentary computers allowed for word processing and printing, as well as dedicated terminal access to databases for shared cataloging (if the library had a budget for that). Once PCs became affordable enough to furnish one to each employee in an office environment, the technological reach of each employee grew to match. Since the 1990s/early 2000s, many catalogers had access to a purpose-built desktop application for creating and managing catalog records. These records may still, for a time, have been printed out on cards to file in drawers, but eventually these records fed into an online version of the catalog, or “OPAC.”
I was not in the profession back then, so I can’t say how satisfied the average library cataloger was with their ergonomic setup. That is, if they even thought in these terms. I do know that when I came onto the scene in the mid-late 2000s as a graduate student and then novice professional cataloger, I didn’t question the state of the art of the metadata work environment.1 Mostly, I was grateful to get to play in the sandbox with the big kids and have a job that paid my bills. Using a keyboard and mouse for 99% of my time seated in front of a computer was just the way of the world, the default modalities.
Initially, this was fine for me, even though I already knew of colleagues who had difficulty using these basic tools and needed some form of accommodation. As the years wore on, my own worsening physical condition (I’ll spare you the sordid story for now) has made it difficult to type for hours at a time. Still, the burden of having to kludge together a bespoke adaptive environment by, for example, installing speech-to-text software was a non-starter for me, and I resigned to suffer in silence for years (though I did find a vertical left-handed mouse that I’ve come to adore). Through all of this, the standard-issue tools for library catalogers have not ergonomically changed in any meaningful way in decades. They still consist of user interfaces requiring typing, clicking, navigating menus, etc. Keyboard and mouse, keyboard and mouse, rinse, repeat. Contrast this with our mobile devices, which for years now have natively supported touchscreen and voice-activated features. Why does my field lag so far behind in UX? It is time to do something about it.
These frustrations of mine simmered in the background for years with no real outlet (or bandwidth to pursue them). Then… a new disruptive technology burst onto the scene in 2022: ChatGPT, powered by colossal large language models trained on large swaths of the internet. I’ll admit, I was among the most skeptical at the beginning. I played with the interface a bit and tried to stump it with nuanced questions. (Quite easy to do in those early days.) Then I put it off to the side, not seeing any real potential for my work. Meanwhile, many of my colleagues took to more in-depth research with using AI chatbots to generate catalog records, with very mixed results. I respected those colleagues for jumping into the fray and trying to make sense out of this technology that we are told will someday come for all of our jobs. I found myself dissatisfied with the methodology, though. How do we come up with silver-bullet prompts that result in viable metadata? How do we prevent hallucinations? Will this really save any time if us humans have to check every piece of work generated by AI? The discourse around these questions has been contentious, with some dismissing the entire technology outright. I’ll have more to say about these things in a future post, but suffice it to say, I was disinclined to jump into the debate on these terms.
The first lightbulb moment came in the fall of 2024, when I attended a webinar about automation using AI as a part of an integrated software stack. I realized that perhaps my field wasn’t asking enough of the right questions. It’s not about how well a slick single general-purpose chatbot can mimic the work of a human cataloger, it’s how the human can deploy generative AI to automate the aspects of cataloging work that a machine can do faster and just as well (if not better) than a human can. It’s not about replacing a human’s effort, it’s about redeploying the human mind and freeing their physical body from the drudgery of data entry. Not displaced catalogers, but catalogers in the loop.™
Around this same time, I participated in my library’s annual staff development day, which included a sequence of “hot topic” lightning talks about AI and how it is hitting our shores in various guises. I was asked to speak about the cataloging/metadata angle and try to summarize the state of the art, nebulous as it was at that time. Later in the day, we engaged in a reflective goal-setting exercise for the coming year, and that’s when my clarifying moment came. I needed to jump into the fray, lest I get left behind in the conversation, and (more importantly) for fear of inadequate tools being foisted on me and my community without us putting our stake in the ground about what we want these “tools” to actually do for us. I heard the call to action and believed that my unique perspective and background could add value to the conversation. Weeks later, I submitted my sabbatical application, and here I am today!
So how will ErgoCat work? I’m so glad you asked…
ErgoCat’s design components
[Disclaimer: all design specs, working names, and under-the-hood details are subject to change!]
ErgoCat, as currently envisioned, will be a desktop/tablet app that supports an end-to-end workflow for the cataloging of music scores (sheet music) in a manner that is streamlined and requires only a modest amount of “traditional” human input (i.e., keying and mousing). Functionally, under the hood are multiple components (agents) each programmed to excel at a specific part of the cataloging task. The components have their own working names, for ease in communication and documentation:
UploadHandler: this is the “front door” of the app. Built in Microsoft Power Platform (inspired by the aforementioned work of Hannes Lowagie), the user captures (via integrated camera) one or more images of a score, tags the source type (title page, first page of music, cover, etc.), and uploads them to the system.
A future version of ErgoCat could also be configured to work as browser extension, to support seamless handling of scores already published online.
LayoutDetector: a computer vision model (like “OCR” but on steroids) reads each page, detects blocks of text, draws bounding boxes), and passes the marked-up images to the next agent.
ChunkClassifier: takes the OCRed text from the LayoutDetector and assigns them to a broad bibliographic element (title, publisher statement, edition statement, etc.).
SubelementParser: where necessary, subdivides larger bibliographic “chunks” into their subelements (e.g., title, subtitle), using heuristics related to layout and typography. The output of both this and the ChunkClassifier are passed to the next agent.
AuthorityAgent: identifies named entities and queries them against authority files such as LCNAF, LCGFT, LCMPT, etc. Returns matching authority data including authorized access points (the text strings that catalogers use to precisely identify entities), a confidence score, and contextual information for the human to review.
MARCBuilder: maps transcribed bibliographic data and matched authority strings to the MARC bibliographic format, according to heuristics based on the data and (in the case of transcribed elements) the source of that data. Will also include provenance statements as necessary.
ReviewUI: in a Power Apps interface, the cataloger reviews agent output and accepts/rejects it, either one agent at a time, or by hand editing the complete generated MARC data. When satisfied, the user exports the MARC data to an appropriate environment (e.g., OCLC Connexion) for finishing the catalog record and adding it to a production database.
FeedbackLogger: keeps a log of all user corrections, which are then fed back to the agents for ongoing machine learning purposes.
Prototype as protest, or, The form factor is the X factor
OK, those are the on-the-ground components. But lest we get lost in the weeds from the outset, let’s return to the activist aspect of this project. I am attempting to build an interface that I wish I could have had all these years, and I should be so privileged to offer something to other catalogers who dream of something better than the status quo. In that sense, ErgoCat is less about proving a technical point than making a cultural one: catalogers deserve tools as humane as the knowledge environment we steward. To that end, here are the higher-level functional requirements of the app:
Human-in-the-loop: Agents assist but never replace cataloger judgment. The human user makes all final decisions and is responsible for applying all applicable cataloging standards and best practices to the final product.
Speed (but not for its own sake): each end-to-end cataloging transaction (image upload —> MARC data export) should take no more than 30 seconds, and ideally less than 15 seconds. In other words, the tool should be faster than a typical cataloger can work using traditional tools, even when review and error correction is taken into account. Someday, perhaps interfaces like this could enable someone to catalog as fast as they can think. In other words, the time it takes to catalog something should be based only on the interpretive work required, and never on the size of the data entry burden.
Ergonomics and accessibility first: speed aside, this tool is only as useful insofar as it improves the embodied experience of cataloging for the average cataloger. Going beyond the Power Apps default modalities, which do support touchscreen input where hardware is compatible, the app should also support other adaptive modalities like voice activation, wearable cameras, and even gesture-based computing (someday!). In other words, catalogers with adaptive needs will be served most, but the productivity gains and ease of use should benefit everyone.
Live on the “edge”: wherever possibles, agents that use AI components such as transformers should be amenable to run on a local machine (“edge computing”). This means those transformers should be more compact and focused in purpose. Authority matching should be done against downloaded database files (which can be periodically re-downloaded via the app). Web calls (via API, etc.) should be reserved for cases where edge computing is not feasible for the average cataloger’s middling hardware, such as when larger, more sophisticated AI models need to be prompted.
Minimize costs: related to the above, the app should be free (or as close to free as possible) to download and use. Also, exploiting edge computing will help mitigate against the effects of excessive querying of overpowered general-purpose AI models like GPT-5; these costs are both financial and environmental.
How will this be accomplished? In short, by fine-tuning each agent’s AI substrate to excel only at its assigned task, and by managing its context window for quickest performance and optimal results.
What will make this challenging? The prototype I am building will work in the Power Platform environment which is currently tied to my institution’s site license. Exporting the codebase to another environment for sharing purposes will be a challenge for a future day.
Zero tolerance for hallucination: to guard against the most pernicious of occupational hazards with using AI tools, the components of ErgoCat will be designed to perform functions that resist hallucination. Classification, reformatting/markup and authority querying functions are based on processing raw input and on reproducing retrieved metadata verbatim, respectively. Typical GenAI behavior in the vein of general purpose chatbots is kept to a minimum.
Music as testbed: the prototype will be fine-tuned to notated music resources (pun intended), but it should be adaptable for other resource formats down the line. To invoke an oft-cited claim made by my musically inclined library colleagues, if a tool can be designed to work for music, it can probably work for anything!
Believe it or not, I have kept this inaugural post as brief as I could! I look forward to wading into the weeds with you over the coming weeks and months. I’ll do my best to post dispatches at least once a week or so.
Stay tuned for more!
Does my work intrigue you? Are you curious about how you might riff off of my research to build your own tools? If you’re a cataloger, metadata librarian, or just a curious technologist, I’d love to hear your experiences with the tools you use daily—what works, what doesn’t, and what you wish existed.
Please subscribe and join the party!
Warmest regards,
Casey and Ralph (the ErgoCat mascot; IYKYK)
AI usage note: in the interest of transparency, I pledge to disclose where and how I supplement my blog posts with AI input. For this post, the first draft is all mine, with light editing suggestions and some hyperlinks provided by ChatGPT. The image of my app’s mascot was created by Microsoft Copilot. For the blog as a whole, ChatGPT also helped me brainstorm names and create its logo.
As a brief aside that speaks to my positionality, I’ll self-identify as a member of the “Xennial” microgeneration, who came fairly easily to technology as a teen and young adult, but who definitely remembers a world without ubiquitous computer and internet access!

How exciting this is! Congratulations on such an excellent idea and initiative! I’m looking forward to reading more. Thank you!