Unexpected Progress

[posted by Gavin Robinson, 5:35 pm, 11 July 2007]

It’s been a long time since I wrote anything about my First World War digitization projects, but I now have some progress to report: today I published an interim version of Sandall’s History of 5th Lincolnshire Regiment. It’s still a work in progress, and there’s a lot more to be done, but you can see it here. It’s just a plain HTML version (and not strictly valid HTML), and the whole text is on one page (at least it makes it easy to search the whole text with your browser’s Find feature!), there’s no name linkage yet, no page images online, and no mechanism for submitting corrections. However, even in this form it should be useful to people who are researching the battalion and can’t get hold of the original book. More details on what I’ve done and how I’ve done it below.

(more…)

XML Tagging: Phase 1

[posted by Gavin Robinson, 1:04 pm, 26 February 2007]

Having proofread and corrected the digital text captured from Sandall’s history of 1/5th Lincolnshire (corrected to an adequate standard anyway — I can’t claim that it’s perfect), I was ready to start inserting XML tags. The first phase of markup involves the use of TEI XML tags to describe the basic structure of the text. There was nothing too difficult here, and a lot of it could be done automatically rather than reading through the text and manually inserting tags at every feature. Before I started I had to decide which tags to use and where to use them, then make sure I applied them consistently. This post gives more details of the tags I used, what I used them for, and how I got them into the text with minimal effort.

(more…)

Tags: , , , , ,

Comments Off

Proofreading

[posted by Gavin Robinson, 12:01 pm, 23 February 2007]

In my last project update I described how I used FineReader to OCR the text of Sandall’s History of 5th Lincolnshire Regiment. Since then I’ve manually proofread the text and inserted some basic XML markup. Proofing and basic tagging have given me a more detailed understanding of the text and the features in it, and I’ve been noting potential issues as I go. I’ll post more about how I’m using XML later, but this post is a more detailed description of the process of proofreading.

(more…)

Tags: , , ,

Comments Off

Digital History Project: Update

[posted by Gavin Robinson, 7:24 pm, 15 February 2007]

Another project update. Things have been slightly delayed because I have an article to rewrite (which means I’m slightly closer to getting published) but I’ve still been making some progress. This weekend I’ll be proofreading Sandall’s book. When that’s done I’ll be able to export the text and start tagging it with XML. But first I’ve been looking through the TEI guidelines, picking out the tags I think I’ll need, and working out how I think I’m going to use them. This is crucial because there are often different ways to mark up the same text and it’s important to be consistent. It’s also important to only apply tags which will actually be useful to users, because there’s an awful lot of potential to waste time marking up text in microscopic detail that no-one has any use for. As I do the proofreading I’ll also be looking at the structure of the text and the features in it that will need marking up, and revising the provisional tagging guidelines if necessary. Once I’m happy with the tag set and the guidelines for using them I’ll post it all (but be warned: it won’t be very interesting!). Even then I’m expecting to find some unexpected situations once I start trying to insert the tags.

Digital History Projects: OCR

[posted by Gavin Robinson, 8:11 pm, 7 February 2007]

Now that I’ve got all the theoretical agonising out of the way, I can actually do something about digitizing the text. This week I’m carrying out OCR and proofreading on the text of Sandall’s History of 5th Battalion the Lincolnshire Regiment. As soon as I got to work I encountered issues that I hadn’t thought of, and found that subjective decisions had to be made even earlier than I’d anticipated. This just shows that the only way to learn how to do something is to do it.

(more…)

Tags: , , ,

Comments Off

Digital History Projects: Progress Report

[posted by Gavin Robinson, 3:57 pm, 24 January 2007]

This is a progress report on the First World War digitization projects I outlined previously in my post on planning.

(more…)

Newer posts