Medieval Soldier Database

[posted by Gavin Robinson, 9:27 am, 14 May 2008]

While trawling (not trolling) for more posts that I can include in the next MHC, I found something interesting via Muhlberger’s Early History:

The Soldier in Later Medieval England is a major research project directed by Anne Curry (who was my personal tutor when I was an undergraduate at Reading). They now have a pilot database online (with free access) with details of thousands of soldiers who fought in the Hundred Years War. This should be really useful for anyone interested in medieval military history, not least because the financial records that the data comes from give much more accurate figures for army sizes than the estimates in chronicles.

You’re del.icio.us

[posted by Gavin Robinson, 5:00 pm, 29 January 2008]

I’ve just finished clearing out my bookmarks folder and putting my links onto del.icio.us. I don’t know why I haven’t done this before as it’s much better than keeping over 300 links in my Firefox bookmarks. From now on bookmarks are only for the few sites that I access most regularly. Zotero is for pages that need to be dealt with in more detail, with snapshots, notes, and annotations, or which need to be kept together with bibliographies for projects that I’m working on. And everything else goes on del.icio.us. While I was rearranging everything I took the opportunity to add some of the best sites to History Nexus.

Some other cool things:

Operator is a Firefox plugin which detects and displays Microformats. Microformats are a simple way of embedding metadata in web pages using only HTML.

Firebug is another Firefox plugin, a bit like the Web Developer toolbar but much more powerful. It lets you inspect the code of a webpage with expanding and collapsing tags, highlights the current element on the page, displays all CSS styles which apply to an element, debugs Javascript, and even lets you rewrite the code on the fly! I’ve found it very useful for developing Exhibit pages, and it would also make it a lot easier to design or modify Wordpress themes.

Yahoo Pipes is a set of tools for data mining and mashups. It’s kind of what I was wishing Google would do in a previous post. It looks complicated, but still easier than programming from scratch, and very powerful. I’ll be trying it out whenever I get time.

Marking Up Names: Part 2

[posted by Gavin Robinson, 3:01 pm, 19 January 2008]

My digital edition of Sandall’s History of 1/5th Lincolnshire Regiment now has a new index of people. In my last post I described how names were marked up in the text. This post is about how I linked them together.

(more…)

What I really, really want

[posted by Gavin Robinson, 7:05 pm, 3 January 2008]

I’ve been playing with Google Custom Search and although it’s good it would be much better if it could recognize metadata in microformats, RDF, or any other formats that metadata might be found in. And if it could also scrape data off web pages using regular expressions (sort of like Feed43 but better). And if you could create custom search fields and define how they map to Google Base fields, Freebase fields, metadata tags, and scraped data.

And I want the moon on a stick.

Google Base and Great War Soldiers

[posted by Gavin Robinson, 1:08 pm, 27 December 2007]

I’ve just been looking into Google Base, which lets you upload structured data in XML format and make it searchable on Google (although so far Base pages don’t seem to show up in the standard web search). The data is described using item-types and attributes, and although Google provides recommended types and attributes you can also make up your own, for just about any purpose you want. This kind of semantic markup gives the potential for much more specific and accurate search results than a normal web search.

Now I’m wondering if this could be a possible solution to a big problem that I’ve been thinking about for a while: pulling together a list of all the British soldiers who served in the First World War and everything that’s known about them. This would be in the region of several million names. Many of the details are already available online in various places but they’re not linked together. The CWGC has a more or less complete database of personnel who meet their criteria of having died as a result of the war, although new names are discovered every so often (their own search engine can only search by name, not by regiment or service number). Surviving service records (only about 30 to 40% survived the Blitz!) are being put online by Ancestry, although it’s subscription only and the indexing and transcription are reputed to be really terrible. The UK National Archives has made the medal index cards available online (I’ve seen several transcription errors in the index but it’s apparently not as bad as Ancestry). This collection contains nearly 5.5 million records and should mention every soldier who qualified for a campaign medal by serving overseas (although there are unsubstantiated rumours on the Great War Forum that some cards were lost in transit). The medal cards also include men who were awarded a Silver War Badge for being discharged as unfit for service, even if they hadn’t served overseas. Officers are more problematic because if they survived the war they had to apply for their campaign medals and there are many known examples of officers who didn’t make a claim and so have no medal card. Commissions and gallantry medals are shown in the London Gazette, which is available online, but its search engine is notoriously difficult to use. Then there are various personal websites of people who are researching their families or a particular unit. And there’s the Great War Forum, which contains a huge number of posts on individual soldiers, often pulling together information from many different sources, from the most well known online databases to obscure local newspapers and family collections.

In theory something like Google Base could help to pull all this stuff together and make it easier to find information on specific people. For example, you could create an item type for soldiers and give it attributes like name, rank, regiment, battalion, service number etc. First of all a lot of thought and consultation would need to go into defining the item and attributes to make it as useful as possible to as many people as possible. This is definitely something to think about for the future.

However, there are some limitations which mean it isn’t going to happen soon. The biggest problem I can see is that you have to manually upload the records to your account, and edit them whenever they change. You can use the API to automate this but I think it would be much better if you could just embed Google Base metadata in a webpage and let Google’s spiders pull it out automatically. Another thing is that there doesn’t seem to be any scope for collaboration. Once you’ve uploaded your data no-one else can edit it. This is quite disappointing because sharing is a big part of Google Docs. In my experience many expert Great War researchers do not have advanced IT skills and so we need things to be as simple as possible, and easy ways of helping less IT literate people by being able to edit their stuff directly. The Your Archives wiki has shown that this can work really well: it doesn’t matter if people haven’t formatted their pages properly or don’t know how to insert a link. As long as you put up some relevant information, someone else can sort it out.

But these are changes that Google could make in the future, so it’s something to watch out for. There must be lots of other ways that historians could use Google Base. It’s already good enough for smaller data sets which have already been compiled by one person, so I might be able to put up some of my English Civil War data.