Google Base and Great War Soldiers

[posted by Gavin Robinson, 1:08 pm, 27 December 2007]

I’ve just been looking into Google Base, which lets you upload structured data in XML format and make it searchable on Google (although so far Base pages don’t seem to show up in the standard web search). The data is described using item-types and attributes, and although Google provides recommended types and attributes you can also make up your own, for just about any purpose you want. This kind of semantic markup gives the potential for much more specific and accurate search results than a normal web search.

Now I’m wondering if this could be a possible solution to a big problem that I’ve been thinking about for a while: pulling together a list of all the British soldiers who served in the First World War and everything that’s known about them. This would be in the region of several million names. Many of the details are already available online in various places but they’re not linked together. The CWGC has a more or less complete database of personnel who meet their criteria of having died as a result of the war, although new names are discovered every so often (their own search engine can only search by name, not by regiment or service number). Surviving service records (only about 30 to 40% survived the Blitz!) are being put online by Ancestry, although it’s subscription only and the indexing and transcription are reputed to be really terrible. The UK National Archives has made the medal index cards available online (I’ve seen several transcription errors in the index but it’s apparently not as bad as Ancestry). This collection contains nearly 5.5 million records and should mention every soldier who qualified for a campaign medal by serving overseas (although there are unsubstantiated rumours on the Great War Forum that some cards were lost in transit). The medal cards also include men who were awarded a Silver War Badge for being discharged as unfit for service, even if they hadn’t served overseas. Officers are more problematic because if they survived the war they had to apply for their campaign medals and there are many known examples of officers who didn’t make a claim and so have no medal card. Commissions and gallantry medals are shown in the London Gazette, which is available online, but its search engine is notoriously difficult to use. Then there are various personal websites of people who are researching their families or a particular unit. And there’s the Great War Forum, which contains a huge number of posts on individual soldiers, often pulling together information from many different sources, from the most well known online databases to obscure local newspapers and family collections.

In theory something like Google Base could help to pull all this stuff together and make it easier to find information on specific people. For example, you could create an item type for soldiers and give it attributes like name, rank, regiment, battalion, service number etc. First of all a lot of thought and consultation would need to go into defining the item and attributes to make it as useful as possible to as many people as possible. This is definitely something to think about for the future.

However, there are some limitations which mean it isn’t going to happen soon. The biggest problem I can see is that you have to manually upload the records to your account, and edit them whenever they change. You can use the API to automate this but I think it would be much better if you could just embed Google Base metadata in a webpage and let Google’s spiders pull it out automatically. Another thing is that there doesn’t seem to be any scope for collaboration. Once you’ve uploaded your data no-one else can edit it. This is quite disappointing because sharing is a big part of Google Docs. In my experience many expert Great War researchers do not have advanced IT skills and so we need things to be as simple as possible, and easy ways of helping less IT literate people by being able to edit their stuff directly. The Your Archives wiki has shown that this can work really well: it doesn’t matter if people haven’t formatted their pages properly or don’t know how to insert a link. As long as you put up some relevant information, someone else can sort it out.

But these are changes that Google could make in the future, so it’s something to watch out for. There must be lots of other ways that historians could use Google Base. It’s already good enough for smaller data sets which have already been compiled by one person, so I might be able to put up some of my English Civil War data.

3 Comments »

RSS feed for comments on this post.

TrackBack URI

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

If your comment does not appear, it has been held for moderation. Please do not submit it again.

If you supply a false e-mail address your comment will be deleted.