Digital Express

[posted by Gavin Robinson, 8:00 pm, 8 February 2008]

Having decided to leave my 5th Lincolnshire First World War project for a while, I got an offer I couldn’t refuse: someone from the Great War Forum sent me a transcript of the battalion’s medal citations from the regimental archive so that I could publish them on my site and link them in to the index of people that I’d created for the book. The document contains information that can’t be found elsewhere, as although awards of the Military Medal were listed in the London Gazette, full citations were not normally published. There are also three awards not mentioned in Sandall’s list, and citations for 10 people who were recommended for awards but turned down.

I received the list as a Word file with no semantic markup on Wednesday morning, started working on it on Thursday morning, and published it on the web this afternoon. It looks very basic but it’s not bad for two days, and it’s all linked in to the index of people for Sandall’s book. First of all I copied the text into jEdit and used Find and Replace to insert some basic TEI XML markup. Then I pasted it into a new TEI document in oXygen. With the automatic validation it was easy to track down and correct errors in the markup, so by lunch time I had a completely valid TEI file. In the afternoon I spent about 3 or 4 hours on linking records by inserting key attributes into <persName> tags. In most cases I already had the keys that I used for linking names in Sandall, but sometimes I had to change them in the light of new evidence from the citations, such as full names of people who I previously only knew by their initials. This also allowed me to clear up some ambiguities . This morning I finished the linkage by creating new keys for the 13 people not mentioned by Sandall, then got started on writing some XSLT. That was easy as I could copy or adapt a lot of the code from the style sheet for Sandall. As well as generating the HTML version of the citations, this XSLT generates an extra JSON file which is imported into the Sandall index of people to allow linking the citations. Again this only required some minor adjustments to the Exhibit page. After some testing and corrections I had a live site up this afternoon.

This demonstrates the potential value of the techniques I’ve been using for marking up texts, but it also raises some problems for digital history. I decided to trust a transcript from a random person off the internet. I have no way of knowing how accurate the transcript is, or even if the source document really exists! It could be Hugh Trevor Roper and the “Hitler Diaries” all over again. Therefore I’m going to think more carefully before putting myself in this situation again. There’s also a possibility that I’ve miscalculated the copyright situation. Based on internal evidence and comparison with other documents my best guess is that the list was created by the army and is therefore under Crown Copyright (and being unpublished and available for inspection in a public record repository should come under waiver of Crown Copyright), but without seeing the original it’s hard to be sure. I might be wrong, and even if I’m right the holders of the manuscript might not agree. So technology makes some things easier, but there are other problems that it can’t solve.

No Comments »

RSS feed for comments on this post.

TrackBack URI

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

If your comment does not appear, it has been held for moderation. Please do not submit it again.

If you supply a false e-mail address your comment will be deleted.