Skip to main content
The Collation

Tagging manuscripts: how much is too much?

When it comes to the subject of tagging or encoding manuscript transcriptions in XML (extensible markup language) for Early Modern Manuscripts Online (EMMO), two important questions are how much should we tag and when should we do it.

With thousands of pages from a variety of genres, the “how much” question is a big one. For example, should tags be used to provide information about ink color, shifts in hand, size or ornamentation of letters, illustrations, marginalia, flourishes, indentations, spacing, symbols, quotations, layout, structure, lines, paper material, historical/literary connections, etymology, smudges, etc., etc.? The images of manuscript pages below give some idea of the challenges involved:

Summary of accounts of the offices of the tents and revels from 1550 to 1555. (L.b.315)

Summary of accounts of the offices of the tents and revels from 1550 to 1555. (L.b.315)

Comments

As someone who has spent a fair amount of time using Dromio, I think that on the whole it has found a great point of equilibrium in transcription, giving the transcriber a versatile but relatively simple set of tools. I think your idea that the digital images available on Luna minimize the need for an overly complex set of tags is spot on. However, I do wonder about tools to help replicate complexities of formatting – things written in double column, charts, etc., which are hard to reproduce coherently in a simple line by line transcription. The photos of L.b. 315 and V.b.26 that illustrate your post are excellent examples of these kinds of pages (V.b.26 has those tough symbols as well!)
I’ve come across similar pages while looking for things to transcribe in Dromio, and have always just skipped them because I’ve never figured out how to create a coherent transcription in Dromio out of an intricately formatted page.
I know these are questions about formatting more than about tagging per se, but are their plans to include richer formatting options in Dromio at some point in the future?

Dylan Ruediger — June 11, 2015

Reply

An excellent question! Although focused on the text, we are considering ways to transcribe/encode these complex manuscript pages while providing some meaningful indication as to the formatting, layout, etc. Since the words and structure of the information on the pages are often related, one of the ideas under discussion is to include attention to formatting along with transcription for part of the next Advanced Paleography Workshop (in December, 2015). Employing spatial coordinates may help, but we will be interested in suggestions of the participants as intricate pages are examined. Other special events may be organized in the future to tag certain difficult pages. Potentially, additional tags could be added to Dromio, of course, as we did in the last Advanced Paleography workshop. Also, further tagging could be done as part of a separate project on a particular manuscript or set of manuscripts.

Paul Dingman — June 11, 2015

Reply

This is an excellent explanation of the EMMO tag set. What it lacks is a few illustrative examples of tags, for those of us who haven’t yet seen them. Any chance of an addendum?

William Ingram — June 11, 2015

Reply

Yes, a discussion of particular encoding tags and attributes will likely show up on a future post about EMMO. You may wish to check out the link to the Text Encoding Initiative (TEI) for examples and explanations of some of the tags mentioned such as “” for expansion or “” for deletion. We are using tags that will be compliant with TEI-P5.

Paul Dingman — June 11, 2015

Reply

There is no real way of pre-determining the “right” or “sufficient” or a “superfluous” amount of tagging. In my consulting practice on knowledge management (which is a different animal than purer information management!) I always argue that “this lies in the eyes of the beholder”. I.e. some researcher in some distant future might want to write his/her thesis about “smudges over the centuries”. And maybe, the information would not be readily available at the time. But maybe the researcher then does an all-encompassing field study, develop specific metrics etc. Wouldn’t it be great if he/she could then add his/her information (knowledge) to the existing corpus? This is what a learning organization would do – provide a basic structure to begin with (certainly author and title were always indispensable, even in the Alexandrian Library) but then ALLOW and ENCOURAGE other experts to be able to add their expertise and preserve it “down the ages”. HTML, doi and other markups now allow this to be done technically – all we need to do is make it desirable and possible.

Darragh McCurragh — June 12, 2015

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *