First, I'll need to explain a little about TEI encoding. The Text Encoding Initiative (TEI) is "a consortium which collectively develops and maintains a standard for the representation of texts in digital form." Essentially, TEI encoding is a set of standards for taking a text--say the poems of Emily Dickinson or a sixteenth-century Polish fencing manual--and turning them into a robust, searchable XML document which can then be displayed in a number of different formats, inserted into a database in order to perform corpus linguistics analysis, and etc. Basically, once the text is in this standardized XML form, you can do anything with it.
[If you're reading this on a web browser there's a pretty good chance you have an idea of what XML is, but if you don't, you can go and educate yourself here.]
The Menota Project
Now, there are a lot of TEI encoding projects out there (including some which are pretty useful for medievalists and classicists everywhere, like the Perseus project), but the one that is important to this project is the Medieval Nordic Text Archive (Menota). Menota is basically a network of institutions working to do for the Scandinavian languages (mostly Old Icelandic and Old Norwegian at this point, though there are some Old Swedish texts as well) what the Perseus project has done for Greek and Latin. Menota has laid out a process as well as a series of standards for encoding these Old Norse manuscripts at three different levels of text representation:
Facsimile: At this level, the manuscript is transcribed character by character, line by line, retaining all abbreviations and allographic variations found in the original text. This is the closest level of reading possible short of actually handling the manuscript.
Diplomatic: Certain allographic variants are normalized (for instance, the rotunda "long s" is frequently changed to the modern "short" s. Abbreviations are also expanded, and the expansions usually italicized.
Normalized: Spelling and word forms are standardized to conform to grammars and dictionaries for the language in question. For our purposes, this means altering the words to match what you'd find in a dictionary like Zoega's, or a grammar like A New Introduction to Old Norse. This level is useful for newer students of the language who might be confused by the inconsistencies in orthography and even morphology which might be found in an unedited manuscript. It is also the level at which most critical editions tend to be produced. Old English texts, for instance, tend to be normalized to a particular West Saxon literary style, which can often lead to the misleading impression that writing in Anglo-Saxon England conformed to a fairly homogeneous standard.
How normalized spelling is determined for a dead language is a subject for another post, but for now it's enough to know that this...
|The beginning of Hervararkviða from G. Turville-Petre's critical edition of Hervara saga.|
|The same passage from 74r of the Hauksbok manuscript|
Of course to do this, it means you're encoding every single word three times. Which is what I've done. For the entire poem. Yep, it took me a while. And I'm still not done.
Once it's all in there, though, you can use an XSLT stylesheet (a topic for another post) to render any single reading level. That means that every Menota XML document has the potential to contain readings on all three levels, not to mention a wide variety of information about each word--lexical citation forms, base forms, morphology, syntax, the type of word (noun, proper name, verb, etc.). Although this kind of secondary information wouldn't typically be displayed by your style sheet, it might be accessible via a specialized web page or app. So in the final edition of the Digital Hervararkviða, the student should be able to mouse over a word and see its case/number (for nouns) or tense/mood/number (for verbs), as well as the lexical citation form that they can look up in the dictionary. The student should also be able to toggle between the facs/dip/normalized views at will and compare them to photographs of the text. And of course, since all of this information is being stored in XML, it has the potential to be used in a database for performing corpus-level analytics.
In the next post, I'll talk about the process used for creating the manuscript facsimile. We'll jump into the deep end of the pool of Old Norse paleography, and I'll recount the roadblocks and frustrations I encountered as I attempted to figure out just how people decided how things should be put down on paper more than 600 years ago.