Your Archives: an obituary

[posted by Gavin Robinson, 9:17 am, 11 December 2011]

This week the UK National Archives announced that they will be closing the Your Archives wiki in September 2012. Existing content will be preserved as HTML snapshots and kept available on the government web archive, but it won’t be running on MediaWiki so search, edit and export won’t work. Along with TNA’s other online resources, Your Archives will be replaced by the new Discovery service (now in beta), which will integrate the Catalogue, DocumentsOnline and user-created content, along with a powerful search engine and an API so that third parties can query the data (so no more need for Python scripts to scrape data out of the HTML). It’s not yet clear exactly what kind of content they will and won’t let us add, and I suspect that the scope will be narrower than Your Archives, but better integration should make up for that. One of the biggest problems with Your Archives was that getting incoming links from the Catalogue was very clunky and getting incoming links from DocumentsOnline was impossible (so people browsing DocumentsOnline had no easy way of knowing if a transcript of the document was available). This was a limitation of the Catalogue and DocumentsOnline as much as a limitation of MediaWiki, but in any case it’s good that they’ve solved it.

The announcement claims that ‘online technologies have changed rapidly in that time, and the expectations of our users have also changed’ but I’d say it’s more that TNA’s attitude to user-created content has changed. Back in 2007 they still seemed to be suspicious of it and had to keep it quarantined away from their official website. Now they want to bring it into the catalogue so that everyone can find it more easily. I think Your Archives must have played a part in bringing about that change of policy by showing that user-created content is nothing to be scared of, and that closer integration of all TNA online resources is absolutely necessary. If that’s the case then Your Archives has been a successful experiment. TNA also seem to be getting better at open access and re-usability. In 2007 I complained that the terms of use were too restrictive because they didn’t allow re-use of content, but now they seem to be moving towards putting material under the Open Government Licence, which is pretty much the same as a Creative Commons attribution licence (see draft terms of use for the new service).

I’ve been contributing to Your Archives on and off for over four years. According to the log of my contributions, the first page I created was a transcript of a prisoner of war report on 27 October 2007. Up to now I’ve made 3,410 edits, including creating the third most popular page (which has had over 80,000 views – my ‘proper’ academic publications will never be that widely read). Now as a community moderator I’ll be helping to manage the transition by tidying up existing content and ensuring that it will be as accessible as possible in the archived snapshot version. I’ll also be exploring the possibilities of MediaWiki outside Your Archives. It’s still an immensely powerful and useful piece of software. I used it to draft my book and it worked really well for that, which shows that wiki doesn’t have to mean letting just anyone edit, or even any kind of collaboration at all. I really want to find out how to use Semantic MediaWiki and what it can do. It is kind of sad that Your Archives is coming to an end, but that’s just sentimentality. If things don’t change they’ll stay as they are, and who’d want that?

Multiple Indemnity

[posted by Gavin Robinson, 10:04 am, 20 July 2010]

As part of the research for my book (saying that still feels a bit weird, but I’m sure I’ll get used to it) I’m going through indemnity cases in class SP 24 in the UK National Archives (aka the PRO). The Indemnity Committee was set up by parliament in 1647 to protect soldiers and officials from prosecution for actions that they had carried out under the authority of parliament, such as requisitioning things for the army or arresting royalists. It also dealt with disputes over sequestered rents and debts, and helped to enforce parliament’s order that apprentices who joined the army should be allowed to count military service towards their term of apprenticeship. If someone was prosecuted in court for acts which were covered by the Indemnity Ordinance (and many were despite the Ordinance banning people from bringing cases of this kind) the defendant could send a petition to the Indemnity Committee asking for protection. In SP 24 there are 58 boxes of petitions and other papers relating to cases, such as depositions and lists of expenses. Unlike some classes these are quite well sorted: papers relating to each case are grouped together and sorted in roughly alphabetical order of the plaintiff’s name (although confusingly the plaintiff in an indemnity case is the defendant in the corresponding criminal prosecution). I’m particularly interested in cases relating to horse requisitioning. According to Ian Gentles, about 30% of the military cases involve horses, although from what I’ve seen so far military cases seem to be a minority as many cases are disputes between civilians over payment of rents and debts due to sequestered estates. It usually takes me less than an hour to skim through a box, look at the first petition in each case to see if it’s about horses, and photograph the relevant cases. Sometimes I get cases that look interesting for other reasons, but I try not to wander too far off topic too often. Since I’m photographing these papers for my research, and since the National Archives allow document images to be uploaded to Flickr, that’s just what I’m doing. I’m also putting transcripts or summaries of the documents, along with links to the images, on the Your Archives wiki. You can see what I’ve done so far, and follow my progress in future, via a Flickr collection and Your Archives category.

So far I’ve uploaded cases from the first 2 boxes. I have another 16 boxes ready to be uploaded, but I’m working on some Python scripts to automate the process. The trial run on the first two boxes proved that doing it all manually is quite labour intensive. First I copied the image files from my camera and sorted them into directories for each box. The directory structure is based on the archival reference, so there’s a directory called “SP 24” with sub-directories called “30”, “31” etc. Then I went into each of these directories and made sub-directories for each case, so it looks like this:

  • SP 24
    • 30
      • 1 Abeary vs Windebanke
      • 1 Adams vs Haughton
      • 2 Alford vs King
      • etc
    • 31

And the path to a particular case would be:

SP 24/30/2 Alford vs King

Which looks quite similar to the archival reference.

The numbers at the start of the case name are the part number (each box usually contains three folders called part 1, part 2 and part 3 but I decided not to make directories for these). Up to here it has to be done manually as arranging cases into directories involves looking at the documents to see where a new case begins and to check the names. But from here a lot of it can be automated.

Each directory containing one case needs to have its own photoset on Flickr. I used Postr to upload one case at a time and then used Desktop Flickr Organizer to create a set and add photos to it (I got both of these applications from the Ubuntu repository – if you’re on Windows then… stop using Windows!). Then I used the Organizr on the Flickr website to drag each set into the “SP 24 Indemnity Cases” collection. Once the Flickr photos and sets were in place I went to the web page for each set, manually created a Zotero item for the case, and attached a link to the page. Finally I created a Your Archives page for each case and attached a link to it in Zotero. This includes a template that I made for indemnity cases which gives some basic information in a standardized form and includes a link to the relevant Flickr set. Doing all this manually for each case is quite tedious and takes a long time, so I’m working on some Python scripts to automate the process. What I want the scripts to do is:

  1. Upload photos from multiple directories
  2. Create a separate photoset for each directory, with a name based on the directory name and path
  3. Get the ID of each set and write the IDs and names to a CSV file
  4. (At this point I’ll manually edit the CSV file to add data that will be needed for Your Archives and Zotero and which can only be got by looking at the document images, eg full names of plaintiffs and defendants, date of the petition, summary of the case, categories/tags)
  5. Use the data from the CSV file to construct a wiki page with the correct template and upload to Your Archives through the MediaWiki API
  6. Export an XML file which can be imported into Zotero

So far I’ve written a Flickr upload script which does the first three steps and more or less works. Rather than working directly with the Flickr API I’m using the Python Flickr API library, which makes things very easy. It provides a flickr class with methods to handle API calls and authentication. Before using it you have to go to the App Garden and request an API key, but that doesn’t take long to do. App pages can be kept private, which is what I’m doing in this case as I don’t really have the time or skills to make my scripts fit for public consumption. The next step is to add error handling as the script only works as long as nothing goes wrong. In the real world, there are lots of things that could go wrong. The library throws an exception if it gets an error response from the API. Until I add some exception handling this means that the script just stops on an error. The script will need to keep track of what has and hasn’t been done (photos uploaded, sets created, photos added to sets) so that I can run it again if anything was left undone, and so that it doesn’t try to do the same thing again if it’s already been done. One annoying thing about Flickr’s public API is that it provides no way to create a collection or add sets to a collection. I assumed I’d be able to automate that part of the process but it looks like I’ll still have to do it manually.

For step 5 I’ll be using the Pywikipediabot library. I’ve already done some simple tests on a local MediaWiki installation and it seems quite easy to create a page. Once I’ve finished the script and thoroughly tested it I can ask for a bot account on Your Archives. Step 6 will involve learning a bit more about Zotero RDF. The easiest way to find out how to generate the right code is to export some similar existing items and look at the results.

So just because I’m writing a monograph it doesn’t mean I’ve abandoned digital history. I’ll still be using lots of digital tricks in the background, but they won’t necessarily be obvious in the text of the book. New technology is certainly making my research quicker and cheaper than it used to be. The stuff that I’ve written about above isn’t exactly revolutionary: it saves labour but it doesn’t offer new insights that couldn’t have been found before. But later in the project I’m planning to do some text mining which I hope will show me things that I couldn’t otherwise have found. I’ll also be revisiting phonetic algorithms for place name identification. And if I can’t think of anything else to blog about, there are likely to be some interesting stories in the indemnity cases.

Links

[posted by Gavin Robinson, 12:59 pm, 4 June 2010]

  • The latest Military History Carnival is up at Wig-Wags.
  • The Institute of Historical Research is carrying out a survey to find out what people think about the possibility of podcasting/vidcasting research seminars. Go and tell them what you think. Their digital seminars project also has its own blog.
  • Ross Mahoney linked to a UK National Archives project which involves post blogging the Second World War on Twitter using cabinet papers: @ukwarcabinet
  • Meanwhile, the National Archives wiki Your Archives is starting a project to  create a glossary of historical terms. See the current list of wanted terms, sign up and add what you know.

How To Make A Bookmarklet

[posted by Gavin Robinson, 10:55 am, 13 February 2010]

Knowing how to program can save you from tedious repetitive tasks, such as inserting templates into a wiki page. Recently I’ve been spending more time editing the UK National Archives wiki Your Archives. I created a category for women’s wills, and while I was adding pages to it, I found that a lot of them didn’t have the correct template. Wills that were proved in the Prerogative Court of Canterbury are held by the National Archives and can be downloaded from their DocumentsOnline service. Transcripts of these wills can be posted on Your Archives, and we have a template for them which automatically creates a link back to DocumentsOnline based on an ID code, and formats some key data (testator’s name, dates, catalogue reference) in a standard form. Most of the data which goes into the template can be found in the DocumentsOnline index. We used to copy and paste each value manually, which was not the best use of a human’s time. Faced with the prospect of doing this an awful lot, I decided to write a program to do it automatically. First I threw together a Python script, which was alright for me but no use for people who don’t have Python and BeautifulSoup (and I also wrote it in such a way that it relied on Linux with xclip installed). So then I decided to rewrite it in JavaScript, so that other people could use it in their browsers. You can find the finished version and documentation on the PCC Will Bookmarklet page. Below is a walk through of how I did it.

(more…)

UK National Archives on Flickr

[posted by Gavin Robinson, 1:15 pm, 16 July 2009]

There has been some bad news for historians recently: the RHS Bibliography of British and Irish History has lost its direct government funding and is being privatised in a move disturbingly reminiscent of PFI (and to add insult to injury the IHR claims to be “delighted” about this!); the UK National Archives (or PRO to most of us who use it) can no longer afford to open on Mondays or offer free parking.

But it’s not all bad. There’s also some good news from the National Archives which has got much less attention than the bad news – in fact I’m not even sure exactly when it happened. They are now allowing and encouraging users to upload photos of public records held at Kew to Flickr and similar photo sharing sites. Crown Copyright had already been waived to allow republication of the text of public records but previously publishing images of documents didn’t appear to be allowed. Now it’s confirmed that uploading images to Flickr is allowed (provided that you’ve taken them yourself – this doesn’t cover documents bought from DocumentsOnline or Ancestry). This is a win situation for everyone, because these documents will be made freely available without it costing the archives anything – a major advantage when budgets and funding are being cut drastically.

The NA has its own Flickr account, and a group for visitors. Combined with the Your Archives wiki this could lead to some really exciting stuff. Some people are already using Flickr and Your Archives to publish Metropolitan Police leavers’ registers. The possibilities are endless. I’m certainly going to upload all the photos I take in the course of my research. To start with I’ve put up the service record of my ancestor Tom Wenham from the First World War (photographed from the screen of a microfilm reader).

IMG_0020

Still to come are some indemnity cases from SP24, and sooner or later I’ll have loads of SP28 to share. It would be fantastic if other archives would do this too, although some will probably be too conservative to try it. The British Library still doesn’t allow digital cameras, which just makes me not want to bother with BL manuscripts.

Digital Microfilm

[posted by Gavin Robinson, 6:20 pm, 15 October 2008]

The UK National Archives (or PRO if you’re old-skool like me) has announced a new project called Digital Microfilm. This involves scanning existing microfilms of original documents and making the whole reel available as a single (very big!) PDF file. These files are free to download. The aim is to eventually digitize all the microfilm records held by TNA/PRO and get rid of the microfilm readers at Kew. I think this a great idea as it’s a quick and easy way of making these records more widely available without the time and cost involved in indexing individual documents. Users can post their own indexes and transcripts on the Your Archives wiki. Although the quality of the scans obviously won’t be any better than the microfilm that they came from (and I know from experience that full colour high-resolution digital photos are much easier to work with) PDFs will still be more convenient than using a microfilm reader – no more holding a camera up to the screen to get a copy of the microfilm! I’m not sure whether this project will include records that have already been (badly) indexed and made available through DocumentsOnline and Ancestry, such as WWI service records and medal cards, but I assume records which aren’t currently available anywhere online will be the highest priority.

Saddlers Wills

[posted by Gavin Robinson, 2:30 pm, 10 August 2008]

Way back in October 2006 (when this blog was all shiny and new) I wrote about female saddlers in London during the English Civil War. My work on saddlers and harness makers (male as well as female) is quite open-ended. I don’t know exactly where I’m going with it, so I’m just tying to find out as much as I can about these individuals and their families when I get the chance. A while ago I searched the records of the Prerogative Court of Canterbury for wills of people I was interested in. These are available through DocumentsOnline, but I found it cheaper to print out copies while I was at the PRO (20p per sheet as opposed to £3.50 per will). I didn’t find a will for everyone (some might have had their wills proved in other courts) but I came up with a lot of hits. Recently I finally got round to transcribing them (which was good palaeography practice) and publishing the transcripts on Your Archives.

Although wills tend to come in a standard form, that structure can contain a lot of variety. They can tell us about people’s wealth, business activities, and families, and contain all kinds of incidental details which shed some light on their lives. Below is a selection of some of the more interesting things I found, with links to the full transcripts.

(more…)

Public records and public knowledge

[posted by Gavin Robinson, 10:25 am, 24 February 2008]

Do academic historians or PRO staff have a better knowledge of the public records? For records of the civil wars I suspect that academics have the upper hand. SP28 is not very well catalogued and sorted. Only researchers who have spent years working on it really know what’s there, and even now the source hasn’t been used to its full potential. Things are different with records of First World War soldiers. Amateur researchers seem to know far more about these than either academics or archive staff.

PRO/NA staff are increasingly aware that other people know more about their records than they do. One way they have responded is by launching Your Archives, a website running on wiki technology which allows anyone who has specialist knowledge of archival sources in the UK to contribute what they know. The site first opened to the public in spring 2007 and has continued to grow since then. I first started using it in October, and I’ve noticed an increase in activity in recent months. As well as the First World War stuff that I mentioned before, I’ve created a British Civil Wars category and started to populate it with my PhD notes, mostly taken from SP28.

There’s far more information that could be added, by me and by other people. Although contributions have been steadily increasing the number of regular contributors is still relatively small. I managed to encourage a few people from the Great War Forum to get involved, but not very many. Maybe one of the problems is that contributors need an unusual combination of specialist knowledge of archives, IT skills, and confidence with Web 2.0 ways of thinking. Or maybe Wikipedia has given all wikis a bad name that they don’t deserve.

If anyone who is reading this has relevant knowledge of PRO documents but hasn’t contributed to Your Archives, what would make you more likely to contribute?

Sandall: The End of the Beginning

[posted by Gavin Robinson, 4:14 pm, 1 February 2008]

Having made good progress with my project to digitize Sandall’s History of 5th Lincolnshire Regiment in the last month I’m going to leave it for a while. This month I haven’t read any books or articles, haven’t written anything other than blog posts and computer code, and have only occasionally thought about historiography and theory. I kind of like it like that but I have other things to get on with now.

I’ve made some small changes since the last post. Dates now have tool tips, so if you hover over them you can see the full date. The place name index is a bit more user-friendly. I’ve replaced the hash values with query strings in the incoming links so that the Exhibit page filters the list down to the place passed in the query instead of displaying a box with the details. This means that you just have to click on “Map” to go straight to map view with only that place displayed. Once you’re there you can easily take the filter off again to see all the other places. The map view is also zoomed out further by default so that you can see Britain and Egypt. That means that you have to zoom in a long way to get to France and Flanders but I think it’s less confusing than not being able to see Grimsby or Alexandria unless you zoom out.

So the site is now in a satisfactory condition with lots of cool features, and now that I’ve worked out how to do everything I could probably get another book to the same stage within a few weeks. But there are still lots of features that could, and probably should, be added. See below for more details. (more…)

Further Adventures in Your Archives

[posted by Gavin Robinson, 4:58 pm, 5 November 2007]

Over the last week I’ve been exploring the possibilities of Your Archives, the wiki based site set up by the UK National Archives where users can contribute their own knowledge and transcripts of documents. The site has huge possibilities, and so far I feel like I’ve only scratched the surface. To start with I’ve been mostly concentrating on First World War records, as the Great War Forum provides both an immediate audience and lots of potential contributors. Getting these people involved could make a very big difference to the project. I think it’s going to take to get a critical mass of GWF regulars using Your Archives regularly, but I’m trying to lead by example. It turns out that I’m not the first forum member to contribute to YA as another member had submitted some information about Labour Corps medal rolls a few months ago. However, that didn’t lead to lots of other people contributing. Can we change that?

(more…)

Older posts