Random news

[posted by Gavin Robinson, 9:18 am, 5 February 2012]

I’m planning to finish my Winter in Windsor series of posts while it’s still winter, but in the meantime here are some links:

  • My book is going to be published on 21 August 2012, and you can already read the blurb. Just proofreading and indexing to go.
  • Andrew Hickey has written a brilliant short story about Shakespeare which skewers the snobbery of Oxfordian conspiracy theories.
  • Ben Brumfield reports on the 2012 American Historical Association conference from a software developer’s perspective.
  • History SPOT has a podcast of Ben Worthy’s IHR seminar paper on the impact of the Freedom of Information Act.
  • Zotero 3.0 has been released. It can now run as a standalone program as well as a Firefox extension and has lots of new features. I couldn’t have written my book as quickly (or at all?) without Zotero to manage my bibliography and citations.
  • The latest version of the Spotify client crashes whenever I search for Kim Carnes. Bug or feature?

Your Archives: an obituary

[posted by Gavin Robinson, 9:17 am, 11 December 2011]

This week the UK National Archives announced that they will be closing the Your Archives wiki in September 2012. Existing content will be preserved as HTML snapshots and kept available on the government web archive, but it won’t be running on MediaWiki so search, edit and export won’t work. Along with TNA’s other online resources, Your Archives will be replaced by the new Discovery service (now in beta), which will integrate the Catalogue, DocumentsOnline and user-created content, along with a powerful search engine and an API so that third parties can query the data (so no more need for Python scripts to scrape data out of the HTML). It’s not yet clear exactly what kind of content they will and won’t let us add, and I suspect that the scope will be narrower than Your Archives, but better integration should make up for that. One of the biggest problems with Your Archives was that getting incoming links from the Catalogue was very clunky and getting incoming links from DocumentsOnline was impossible (so people browsing DocumentsOnline had no easy way of knowing if a transcript of the document was available). This was a limitation of the Catalogue and DocumentsOnline as much as a limitation of MediaWiki, but in any case it’s good that they’ve solved it.

The announcement claims that ‘online technologies have changed rapidly in that time, and the expectations of our users have also changed’ but I’d say it’s more that TNA’s attitude to user-created content has changed. Back in 2007 they still seemed to be suspicious of it and had to keep it quarantined away from their official website. Now they want to bring it into the catalogue so that everyone can find it more easily. I think Your Archives must have played a part in bringing about that change of policy by showing that user-created content is nothing to be scared of, and that closer integration of all TNA online resources is absolutely necessary. If that’s the case then Your Archives has been a successful experiment. TNA also seem to be getting better at open access and re-usability. In 2007 I complained that the terms of use were too restrictive because they didn’t allow re-use of content, but now they seem to be moving towards putting material under the Open Government Licence, which is pretty much the same as a Creative Commons attribution licence (see draft terms of use for the new service).

I’ve been contributing to Your Archives on and off for over four years. According to the log of my contributions, the first page I created was a transcript of a prisoner of war report on 27 October 2007. Up to now I’ve made 3,410 edits, including creating the third most popular page (which has had over 80,000 views – my ‘proper’ academic publications will never be that widely read). Now as a community moderator I’ll be helping to manage the transition by tidying up existing content and ensuring that it will be as accessible as possible in the archived snapshot version. I’ll also be exploring the possibilities of MediaWiki outside Your Archives. It’s still an immensely powerful and useful piece of software. I used it to draft my book and it worked really well for that, which shows that wiki doesn’t have to mean letting just anyone edit, or even any kind of collaboration at all. I really want to find out how to use Semantic MediaWiki and what it can do. It is kind of sad that Your Archives is coming to an end, but that’s just sentimentality. If things don’t change they’ll stay as they are, and who’d want that?

Valentine Stuckey, a life

[posted by Gavin Robinson, 12:26 pm, 17 July 2011]

A couple of weeks ago I posted about a building (or buildings) called the White Bear in Cornhill, London. This post is about one of the people who lived and worked there. It starts with the same entry in the list of horses contributed to the Earl of Essex’s army (TNA: PRO SP 28/131 part 3 f. 55r, 16 August 1642):

Valentine Stuckly of the white Beare in Cornwall vint[ner] listed one browne bay geldinge, his rider John Courtnye armed wth Carabine a Case of pistolls a buffe Coate and a sword valued in all at £21

I’ve found that his name was spelled lots of different ways, but he seems to have preferred Valentine Stuckey. This narrative of his life is still hypothetical because the record linkage isn’t absolutely certain. I might well have conflated details of two or more men with the same name, but what I’ve written seems probable, and at the very least it makes a good story. (more…)

The White Bear

[posted by Gavin Robinson, 10:14 am, 3 July 2011]

Nick Poyntz is right about the serendipity of digital searches. This weekend chasing up a fairly minor point for my book took me on a web search adventure with lots of interesting tangents. It all started with an entry in the lists of people who contributed horses to the Earl of Essex’s army, dated 16 August 1642 (TNA: PRO SP 28/131 part 3 f. 55r):

Valentine Stuckly of the white Beare in Cornwall vint[ner] listed one browne bay geldinge, his rider John Courtnye armed wth Carabine a Case of pistolls a buffe Coate and a sword valued in all at £21

I’ve always assumed that it means Cornhill in London, not the county of Cornwall, but some proof would be nice. These days names like the White Bear are associated with pubs, but in the seventeenth century pretty much any kind of business premises could be identified with a sign like this. Kathleen M. O’Brien has compiled a list of sign names from seventeenth century tradesmen’s tokens, including ones which combine a colour and an animal. The list mentions three White Bears, but not in Cornhill. It seems to be a very common name: the horse lists also include White Bears in Bread Street, Fenchurch Street, Distaff Lane and Lombard Street. The one in Lombard Street apparently later became the famous Lloyd’s coffee house.

The earliest record I can find of a White Bear in Cornhill is in the early 1620s, when the printer Thomas Jenner was based there (and he sometimes spelt it Cornewall). By 1624 he had moved to the Royal Exchange, at the west end of Cornhill on the north side of the street. The exchange was destroyed by fire in 1666 and 1838 but the current version was rebuilt on the same site and with the same layout. Jenner still sometimes called his new premises the White Bear, or sometimes just gave his address as the ‘South Entrance of the Royal Exchange’ (perhaps it was on the very spot where Agent Provocateur now stands). Jenner stayed at the exchange until his death in 1673, after which John Garrett took over the business and premises.

The idea that Jenner moved out of the original White Bear could be supported by an Ordinance of Parliament passed in 1649, which lists property confiscated from the dean and chapter of Westminster Abbey. Under Birchin Lane in the parish of St Michael Cornhill it lists:

George Dawson, for the White Bear, Two shillings six pence.

Birchen Lane runs from Lombard street in the south to Cornhill in the north, coming out just to the east of the exchange. Even if this building wasn’t actually on the street called Cornhill, it was in the parish of St Michael Cornhill and in Cornhill ward, so could plausibly be described as ‘the White Bear in Cornhill’. And as I found with George Willingham, early-modern London addresses could be quite fuzzy. The entrance of the exchange would probably have been a more desirable location, which could explain why Thomas Jenner would want to move his business around the corner.

Samuel Pepys wrote in his diary for Saturday 8 October 1664, ‘after dinner abroad, and among other things contracted with one Mr. Bridges, at the White Bear on Cornhill, for 100 pieces of Callico to make flaggs’. From internal evidence it’s not clear whether Bridges had his premises there or whether they met in a tavern to discuss the deal, but it doesn’t seem to be Thomas Jenner’s print shop. Specifying ‘on Cornhill’ could imply that it’s not the same as the White Bear in Birchen Lane (unless it was on the corner), or it could be referring to the actual hill rather than the street named after it.

A collection of documents in the Buckingamshire archives includes a marriage settlement from 1781 which mentions the ‘Pensilvania and Carolina Coffee House (formerly the White Bear) in Birchin Lane, Cornhill, London’.

That’s all I’ve found so far. There could be up to three buildings called the White Bear in the same parish at the same time, and there was almost certainly one other than Jenner’s new address at the exchange. If only they’d had geocoding in the seventeenth century…

Coming soon: a brief biography of Valentine Stuckly, which will raise as many questions as it answers. Also on Sunday 10 July I’ll be posting an interview with Andrew Hickey about his experiences with self-publishing.

Text-mining tips

[posted by Gavin Robinson, 10:27 am, 12 June 2011]

These are some insights from the text-mining that I’ve been doing this week:

Stop and think about stop words

One of the first rules of text-mining should be: always make your own list of stop words. Nothing absolutely and objectively is or isn’t a stop word. Which words are and aren’t meaningful depends on your research questions. For example, pronouns are often included in lists of stop words, but I’m very interested in gender so I want to know the frequencies of gendered words like ‘he’ and ‘she’. If you use someone else’s list without thinking about it you’ll probably inherit various biases and assumptions. The kind of text you’re working with also makes a difference. In the proceedings of parliament words like ‘ordered’, ‘resolved’ and ‘committee’ occur too regularly to be much use to most people. If you don’t define your stop words until after you’ve calculated frequencies for every word you can get a better idea of which words are getting in the way and which ones are interesting.

BeautifulSoup is not always the answer

The Python library BeautifulSoup is really useful for extracting data from HTML pages, but maybe I got into the habit of using it too much. This week I was trying to work out how to get some data from pages that didn’t have a very good semantic structure. Doing it with BeautifulSoup looked like it would be really complicated, but then I realised that in this case regular expressions would be much easier.

Have sets

Python includes a sequence type called a set, which combines the best aspects of a Python sequence and a mathematical set, and is incredibly useful for text-mining scripts. Turning a list into a set automatically gets rid of duplicates. For example, suppose you’ve split some text into a list of separate words.

>>>wordlist = 'it was the best of times it was the worst of times'.split()

>>>wordlist

['it', 'was', 'the', 'best', 'of', 'times', 'it', 'was', 'the', 'worst', 'of', 'times']

>>>wordset = set(wordlist)

>>>wordset

set(['of', 'it', 'times', 'worst', 'the', 'was', 'best'])

Now we have a set of unique words which we can iterate through using a for loop, counting the occurrences of each word in the list:

for word in wordset:
    wordcount = wordlist.count(word)

Then we can do whatever we want with wordcount (print it to the screen, add it to a tuple or a dictionary, write it to a file).

You can also do mathematical operations on sets, which can be really useful for removing stop words.

Suppose we have a set of stopwords:

>>>stopwordset = set(['of', 'it', 'the'])

We can deduct that from the set of words before we iterate through it:

>>>wordset = wordset - stopwordset

>>>wordset

set(['was', 'worst', 'best', 'times'])

Now the stop words in wordlist are completely ignored, and we don’t even have to do an if test at every iteration.

A dictionary is a bit like a database

Python dictionaries can be thought of as very simple databases. Obviously they can’t do everything that a database can do, but you don’t have to worry about connections or cursors either. When counting words across multiple files it’s easy to keep a running total of each word by updating a dictionary at every iteration. If the word is already in the dictionary, add to the existing count; if it isn’t, add a new key/value pair.

This is how I do it:

>>>wordcount = dict()

(Then iterate through each file, open and read it etc.)

for word in wordset:
    if word in wordcount:
        wordcount[word] = wordcount[word] + wordlist.count(word)
    else:
        newword = [(word, wordlist.count(word))]
        wordcount.update(newword)

PhD Theses and The Postmodern Condition

[posted by Gavin Robinson, 7:55 am, 5 June 2011]

A few weeks ago I ordered a PhD thesis from EThOS. A few days later they got back to me to say that the university in question wouldn’t supply the thesis for digitization because they didn’t have the author’s permission. In some ways that was a relief because it saved me the time that I would have used up reading it and the £40 digitization fee. Arguably the university and author have lost more because they’ve just missed out on a citation which would have marginally contributed to their reputation and given their research more ‘impact’. Maybe one consequence of digital history will be that material that isn’t easily available on the web might as well not exist. Which is exactly what Lyotard predicted in The Postmodern Condition in 1979 (p. 4):

The nature of knowledge cannot survive unchanged within this context of general transformation. It can fit into the new channels, and become operational, only if learning is translated into quantities of information. We can predict that anything in the constituted body of knowledge that is not translatable in this way will be abandoned and that the direction of new research will be dictated by the possibility of its eventual results being translatable into computer language. The “producers” and users of knowledge must now, and will have to, possess the means of translating into these languages whatever they want to invent or learn. Research on translating machines is already well advanced. Along with the hegemony of computers comes a certain logic, and therefore a certain set of prescriptions determining which statements are accepted as “knowledge” statements.

We may thus expect a thorough exteriorization of knowledge with respect to the “knower”, at whatever point he or she may occupy in the knowledge process. The old principle that the acquisition of knowledge is indissociable from the training (Bildung) of minds, or even of individuals, is becoming obsolete and will become ever more so.

This book should be one of the foundational texts of digital history, but apparently it isn’t.

  1. Jean-Francois Lyotard, The Postmodern Condition: A Report on Knowledge, trans. G. Bennington and B. Massumi (Manchester, 1984).

 

Where is Siegfried Sassoon’s medal card?

[posted by Gavin Robinson, 10:39 am, 14 November 2010]

While I was taking advantage of free access to Ancestry this week, I decided to look for the medal index card of Siegfried Loraine Sassoon, the famous poet. I couldn’t find it. These medal index cards show entitlement to campaign medals for British soldiers who fought in the First World War. Since Sassoon served on the Western Front as an officer in the Royal Welsh Fusiliers, he would definitely have been eligible for campaign medals. Although campaign medals were issued automatically to Other Ranks (or their next of kin if they were dead), officers had to apply for their medals and it isn’t certain that all of them did. If they didn’t then there probably wouldn’t be an index card for them. Sassoon is famous for becoming an opponent of the war and throwing away the ribbon of his Military Cross, so maybe he didn’t claim his campaign medals. But he changed his mind about the war more than once, and went back to the front after his protest, so maybe he did claim them. I already know whether he did or didn’t have a medal card but I’m saving that for later. First, here’s some background about campaign medals and related documents.

There were several campaign medals which were issued for taking part in the war, with different criteria for each one. The most common were the British War Medal, for anyone who served overseas, and the Victory Medal, for anyone who served in a theatre of war. Those who served in the early years of the war could qualify for the 1914 Star or the 1914-15 Star (more details and pictures). The Army Medal Office recorded entitlement to these medals on medal rolls. Despite being called rolls, these are actually books, containing lists of eligible soldiers arranged by regiment. The War and Victory medals are recorded together in the same rolls, and there are separate rolls for each of the stars. These rolls are now held by the UK National Archives in class WO 329, and are not available online. The Medal Office also created a card index to pull together details of each soldier from the different rolls. Every eligible soldier should have at least one medal index card showing name, rank, regiment, service number (except for officers, who didn’t have numbers), campaign medal entitlement and references to the relevant medal rolls. Some soldiers have more than one card, especially if they also won a gallantry medal (although cards for some kinds of gallantry medal are recorded elsewhere and not included in this collection; Sassoon’s Military Cross award card wouldn’t be here as these are in a different class and can only be seen on microfilm at TNA) or qualified for a Silver War Badge by being discharged as unfit for duty. These medal index cards were also transferred from the Medal Office to the Public Records Office (now the UK National Archives) and put into class WO 372. They were arranged as follows:

  • WO 372/1 to WO 372/22: British Army campaign medals A-Z
  • WO 372/23: Women’s Services, Distinguished Conduct Medals and Military Medals
  • WO 372/24: Mentions in Despatches, Meritorious Service Medals and Territorial Force Efficiency Medals
  • WO 372/25 to WO 372/29: Indian Army campaign medals

The cards were microfilmed and the originals put into storage. The microfilm was in black and white, and only the fronts of the cards were filmed. It’s usually reckoned that about 5% of the cards have something written on the back. This information became completely inaccessible. At some point (I think in the early 2000s) the microfilm was digitized and PDF files of the cards were made available for download through TNA’s DocumentsOnline service. These low resolution scans of black and white microfilm were not easy to read, and the information on the back was still inaccessible. The collection was indexed so that individual cards can be found by searching for name, rank, number or regiment. There are some transcription errors, so the index isn’t completely reliable. For example, the card for Arthur Evans shows that he was in the Lincolnshire Regiment, but the DocumentsOnline index wrongly gives this as 32nd London Regiment.

Thanks to DocumentsOnline we can see that Siegfried Sassoon did have a medal card, which can be downloaded here (ref WO 372/17, image 27085). The Medal Office had incorrectly written his name as Siefried Lorraine Sassoon, but as a captain in the 3rd Royal Welsh Fusiliers with a Military Cross, it’s got to be him. There’s even a note saying that his MC was cited in the London Gazette on 27 July 1916 (view page as PDF; the Gazette also incorrectly spells his middle name as Lorraine). The card shows that he was awarded the British War Medal, the Victory Medal and the 1914-15 Star. Although there is a note saying he was eligible on 20 February 1919, the medals don’t appear to have been issued until July 1985. This is nearly 20 years after Sassoon died, so it looks like he didn’t claim his medals himself and that they were claimed later by his family (no more claims for First World War campaign medals are possible now, and all unclaimed medals have been destroyed). The card is in a form usually used for Silver War Badge awards rather than the normal campaign medal styles. The box for date of discharge is blank, but “11/3/19” is written at the top of the card, which ties in with Sassoon resigning his commission through ill health (the London Gazette gives 12 March 1919). The most frustrating thing is that there’s a “PTO” at the bottom of the card, but we can’t turn it over and see what’s on the back.

In 2005 it was announced that the original cards would be destroyed to save storage space, but they were saved at the last minute. The Imperial War Museum took the women’s cards, and the Western Front Association took the rest (you can follow the story on this thread at the Great War Forum; there’s also a report at Your Family Tree magazine). For a while the WFA offered a service where they’d copy both sides of a card in return for a donation. Then they agreed to let Ancestry scan the cards and make them available online to subscribers. Ancestry scanned both sides of the cards in colour, making them much more legible than the TNA versions and making the backs available for the first time since the cards were microfilmed. But Ancestry’s indexing is notoriously bad. The Great War Forum has a whole thread dedicated to showing up the worst examples. It looks very much like Ancestry has done the transcription on the cheap by outsourcing it to people whose first language isn’t English and who know very little about British history and geography. There doesn’t even seem to have been a checklist of regiment or county names, or very much quality control. For example, T. E. Sandall, commanding officer and historian of the 1/5th Lincolnshire Regiment is shown on Ancestry’s medal card index as belonging to “1/5th Essex Tegt” [sic]. Ancestry has about 4.8 million medal cards compared to 5,482,260 on DocumentsOnline, but this seems to be accounted for by the fact that Ancestry hasn’t scanned the women’s cards from the IWM (WO 372/23), or the Indian Army cards (WO 372/25 to WO 372/29).

I can’t find Siegfried Sassoon’s medal card on Ancestry. Given their bad indexing it’s possible that the card is there but can’t be found because the name is completely wrong. But I’ve tried lots of different variants, and Ancestry has a fuzzy search which picks up similar names, and still I can’t find it. Maybe they haven’t scanned it, but it should be with the cards that they have scanned. Fortunately, there’s a way we can check this in more detail. When the cards were microfilmed, they were photographed in batches of six, arranged in two columns and three rows on the same image. When you download a card from DocumentsOnline, you get a whole page showing all six cards. These are the ones which come with Sasson’s, shown in the order that they appear:

Reginald Ellice Sassoon, Capt., Irish Guards Sassoon Joseph Sassoon? [full name not clear], Capt., Inniskilling Dragoons
Ronald Edward David Sassoon, Lt., KRRC Suleman Sassoon, Railway Dept
Siefried Lorraine Sassoon [sic], Capt., Royal Welsh Fusiliers [ie the poet] B Sassounian, Interpreter, XXI Army Corps

Knowing this, I searched for the other five men on Ancestry. Their cards are all there, and their names are all spelt correctly.

Name on MIC TNA record Ancestry record
Reginald Ellice Sassoon Present and correct Present and correct
S. J. Sassoon Present as Joseph Sassoon Present and correct (gives possible variants)
Ronald Edward David Sassoon Present and correct Present and correct
Suleman Sassoon Present as Sassoon Suleman Present and correct
Siefried Lorraine Sassoon Present as S Lorraine Sassoon Can’t find
B Sassounian Present and correct Present and correct

Siegfried Sassoon is the only one of the six whose card can’t be found on Ancestry. This tends to suggest that this isn’t down to Ancestry not scanning the card. They clearly have scanned the batch where it should be. It would need to have moved a long way in the filing system to end up among the cards that haven’t been scanned, and it’s hard to see how that could have happened by accident. That leaves two possibilities:

  1. Sassoon’s medal card has been scanned by Ancestry but so badly mis-transcribed that it can’t be found
  2. The original card was removed some time between TNA’s microfilming and Ancestry’s scanning

Either would be quite embarrassing for all the organizations involved. This post has been a cautionary tale about some of the problems with digitization of historical records. There’s a real danger that archives can use digitization as an excuse to destroy original documents, even when the digital copies aren’t adequate substitutes. When private companies digitize records for profit their cost-cutting can result in poor quality transcription, paradoxically making records harder to find. Keeping public records behind pay walls is also elitist. In Ancestry’s case this is pure economics: if you can afford the subscription you’re in, if you can’t you’re out (although they do offer free trial periods). Early English Books Online takes elitism to a whole new level: they only deal with libraries and won’t even sell you an individual subscription. Meanwhile, if anyone does find Sassoon’s medal card, please let me know.

Digital images: how do you manage?

[posted by Gavin Robinson, 3:09 pm, 23 October 2010]

Back in July I posted about a Python script I was working on to help with organizing photos of archival documents. I didn’t think it would all that interesting to many other people, but a comment from Chris Williams made me realize that there’s potentially quite a lot of demand for something like this. Digital photography in archives doesn’t seem to be much of a sexy buzz topic among digital historians, but it’s something that lots of researchers do even if they’re not into digital history (although Melissa Terras‘s latest book seems to cover it). As far as I know there aren’t any tools specifically designed to help with organizing large numbers of document images. The python script I’m working on is just a stopgap thing which is mostly specific to what I’m doing and how I work, and is never likely to be very user friendly. Maybe what we need is a Firefox extension that plugs into Zotero, or maybe image management features in Zotero itself. Some features that might be useful:

  • Browse a directory of images in Firefox (I used to use MozImage for this, as I was reminded when I found this old post)
  • Mark a page image as being the first or last in a document (this is the really crucial thing, and I’m not aware of any image browsers that can currently do it)
  • Create sub-directories for documents and move images into them based on first and last markers
  • Create Zotero items for marked documents, maybe with some fields pre-filled in a standard form which can be applied to all documents in a directory. For example, if I’m working through box SP 24/30 from the National Archives, set Repository to “TNA” and Loc in Archive to “SP 24/30”.
  • Upload images to Flickr and create sets for them, maybe based on associated Zotero items; attach Flickr links to relevant Zotero items

I’m not in a position to do this myself right now, but I need to learn how to make Firefox extensions sooner or later. Apart from image management stuff, I also need a word count extension (I usually draft most of my writing in a private wiki instead of a word processor; having Firefox count the words for me is much easier than pasting into Open Office just to see how much I’ve written). The one I used to use isn’t compatible with Firefox 3.6 and the author hasn’t updated it for a long time. Counting words can’t be that hard can it? Or maybe it is.

So, does anyone have any thoughts on image management? If you take lots of photos in the archives, how do you deal with them once you get them home? Is there any software I don’t know about which would do what I need? What features would make your life easier?

Baywatch will continue

[posted by Gavin Robinson, 8:36 am, 16 October 2010]

It’s now four years since I started blogging. Last year I said I might stop today, but I’m not going to now. I need a blog to promote my forthcoming book, I’m not ready to do anything completely different yet, and blogging is still a useful way of trying out new ideas and keeping in touch with people. I’ve somehow gone for nearly three months without posting anything because I’ve been so busy. Before I can even start writing the book I have to work on a chapter for an edited collection and also finish building a roof. And there’s an article which is probably going to get revise and resubmit soon. Posts should get more regular from now on, but in the meantime, here are some links and news:

  • Bench Grass is a new military history blog, with some great posts on armoured warfare. One of the few people who really gets cavalry.
  • At Airminded Brett Holman has finished (for now) post-blogging the Battle of Britain and the Blitz. One of the many surprises thrown up by his experiment is that there wasn’t a clear division between the two at the time. The press seem to have been more optimistic than the present myth of The Few would suggest (and it was a big shock to discover that Churchill was mostly talking about bombers in that speech), and some people wanted the Germans to try and invade Britain because they knew it would fail. Despite knowing that German bombs wouldn’t defeat them, the British seem to have massively over-estimated the effectiveness of their own bombing of Germany. Meanwhile Daily Mail readers, then as now obsessed with impractical and morally dubious solutions to exaggerated problems, demanded more reprisal bombings of German civilians.
  • The Institute of Historical Research has launched a digital consultancy service and announced a digital editing system called ReScript.
  • PhDork at The Pursuit of Harpyness looks at “An Anti-Suffrage Monologue”, in which American suffragette Marie Jenney Howe mercilessly exposed anti-feminist hypocrisy by putting contradictory arguments against equal voting rights next to each other, ostensibly so that readers could pick the one they preferred. This kind of hypocrisy hasn’t gone away. Early-modern women’s historians are faced with Lawrence Stone’s objection that elite women are not worth studying because they’re not typical, and David Starkey’s objection that ordinary women are not worth studying because they had no power. Opponents of women serving in combat roles say that a woman wouldn’t be strong enough to drag her wounded male comrades to safety, and that male soldiers would spend too much time looking after their female comrades instead of fighting.
  • Pink Parts is a webcomic set in a strip club and written by Katherine Skipper, who used to work as a stripper. It’s intelligent, honest, funny and really has something to say. Good to see a stripper’s point of view being put across in a medium which is far too dominated by privileged white men. It ties in well with Catherine M. Roach’s book about stripping, which I reviewed last year.
  • Comic genius Kate Beaton gives her own interpretations of courtly love and King Lear.
  • PEP! is a magazine about comics, music, politics, Doctor Who and other things, edited by my friend Andrew Hickey. It even includes some articles by me. I tried to push myself do something different from my blogging and academic writing, which wasn’t entirely successful but I’m all about failing better. In issue 1 (available as free PDF download or expensive print on demand) I gave an argument in favour of political extremism (from a feminist and postmodern angle) which made some good points and one bad point which went up a blind alley to do with Zeno’s paradoxes, but since it provoked a rebuttal from the editor I must have done something right. In issue 2 (PDF; print version available soon) I took a long and exhausting (but nowhere near exhaustive) look at lazy journalism, bad science and gender ideology relating to spatial reasoning abilities. Since I wrote it in March it’s been superseded by some other things (especially Cordelia Fine’s new book Delusions of Gender, and a new report which disproves gender differences in maths ability) but I’m still pleased that I managed to write something outside my comfort zone.
  • Andrew has also written a book about the Beatles. I found the blog posts that this grew out of really interesting, even though I don’t like the Beatles.
  • And finally, you can have minutes of fun looking for film and TV locations on Google Streetview. Here are Baywatch headquarters near Santa Monica and Baywatch Hawaii headquarters at Haleiwa.

Multiple Indemnity

[posted by Gavin Robinson, 10:04 am, 20 July 2010]

As part of the research for my book (saying that still feels a bit weird, but I’m sure I’ll get used to it) I’m going through indemnity cases in class SP 24 in the UK National Archives (aka the PRO). The Indemnity Committee was set up by parliament in 1647 to protect soldiers and officials from prosecution for actions that they had carried out under the authority of parliament, such as requisitioning things for the army or arresting royalists. It also dealt with disputes over sequestered rents and debts, and helped to enforce parliament’s order that apprentices who joined the army should be allowed to count military service towards their term of apprenticeship. If someone was prosecuted in court for acts which were covered by the Indemnity Ordinance (and many were despite the Ordinance banning people from bringing cases of this kind) the defendant could send a petition to the Indemnity Committee asking for protection. In SP 24 there are 58 boxes of petitions and other papers relating to cases, such as depositions and lists of expenses. Unlike some classes these are quite well sorted: papers relating to each case are grouped together and sorted in roughly alphabetical order of the plaintiff’s name (although confusingly the plaintiff in an indemnity case is the defendant in the corresponding criminal prosecution). I’m particularly interested in cases relating to horse requisitioning. According to Ian Gentles, about 30% of the military cases involve horses, although from what I’ve seen so far military cases seem to be a minority as many cases are disputes between civilians over payment of rents and debts due to sequestered estates. It usually takes me less than an hour to skim through a box, look at the first petition in each case to see if it’s about horses, and photograph the relevant cases. Sometimes I get cases that look interesting for other reasons, but I try not to wander too far off topic too often. Since I’m photographing these papers for my research, and since the National Archives allow document images to be uploaded to Flickr, that’s just what I’m doing. I’m also putting transcripts or summaries of the documents, along with links to the images, on the Your Archives wiki. You can see what I’ve done so far, and follow my progress in future, via a Flickr collection and Your Archives category.

So far I’ve uploaded cases from the first 2 boxes. I have another 16 boxes ready to be uploaded, but I’m working on some Python scripts to automate the process. The trial run on the first two boxes proved that doing it all manually is quite labour intensive. First I copied the image files from my camera and sorted them into directories for each box. The directory structure is based on the archival reference, so there’s a directory called “SP 24” with sub-directories called “30”, “31” etc. Then I went into each of these directories and made sub-directories for each case, so it looks like this:

  • SP 24
    • 30
      • 1 Abeary vs Windebanke
      • 1 Adams vs Haughton
      • 2 Alford vs King
      • etc
    • 31

And the path to a particular case would be:

SP 24/30/2 Alford vs King

Which looks quite similar to the archival reference.

The numbers at the start of the case name are the part number (each box usually contains three folders called part 1, part 2 and part 3 but I decided not to make directories for these). Up to here it has to be done manually as arranging cases into directories involves looking at the documents to see where a new case begins and to check the names. But from here a lot of it can be automated.

Each directory containing one case needs to have its own photoset on Flickr. I used Postr to upload one case at a time and then used Desktop Flickr Organizer to create a set and add photos to it (I got both of these applications from the Ubuntu repository – if you’re on Windows then… stop using Windows!). Then I used the Organizr on the Flickr website to drag each set into the “SP 24 Indemnity Cases” collection. Once the Flickr photos and sets were in place I went to the web page for each set, manually created a Zotero item for the case, and attached a link to the page. Finally I created a Your Archives page for each case and attached a link to it in Zotero. This includes a template that I made for indemnity cases which gives some basic information in a standardized form and includes a link to the relevant Flickr set. Doing all this manually for each case is quite tedious and takes a long time, so I’m working on some Python scripts to automate the process. What I want the scripts to do is:

  1. Upload photos from multiple directories
  2. Create a separate photoset for each directory, with a name based on the directory name and path
  3. Get the ID of each set and write the IDs and names to a CSV file
  4. (At this point I’ll manually edit the CSV file to add data that will be needed for Your Archives and Zotero and which can only be got by looking at the document images, eg full names of plaintiffs and defendants, date of the petition, summary of the case, categories/tags)
  5. Use the data from the CSV file to construct a wiki page with the correct template and upload to Your Archives through the MediaWiki API
  6. Export an XML file which can be imported into Zotero

So far I’ve written a Flickr upload script which does the first three steps and more or less works. Rather than working directly with the Flickr API I’m using the Python Flickr API library, which makes things very easy. It provides a flickr class with methods to handle API calls and authentication. Before using it you have to go to the App Garden and request an API key, but that doesn’t take long to do. App pages can be kept private, which is what I’m doing in this case as I don’t really have the time or skills to make my scripts fit for public consumption. The next step is to add error handling as the script only works as long as nothing goes wrong. In the real world, there are lots of things that could go wrong. The library throws an exception if it gets an error response from the API. Until I add some exception handling this means that the script just stops on an error. The script will need to keep track of what has and hasn’t been done (photos uploaded, sets created, photos added to sets) so that I can run it again if anything was left undone, and so that it doesn’t try to do the same thing again if it’s already been done. One annoying thing about Flickr’s public API is that it provides no way to create a collection or add sets to a collection. I assumed I’d be able to automate that part of the process but it looks like I’ll still have to do it manually.

For step 5 I’ll be using the Pywikipediabot library. I’ve already done some simple tests on a local MediaWiki installation and it seems quite easy to create a page. Once I’ve finished the script and thoroughly tested it I can ask for a bot account on Your Archives. Step 6 will involve learning a bit more about Zotero RDF. The easiest way to find out how to generate the right code is to export some similar existing items and look at the results.

So just because I’m writing a monograph it doesn’t mean I’ve abandoned digital history. I’ll still be using lots of digital tricks in the background, but they won’t necessarily be obvious in the text of the book. New technology is certainly making my research quicker and cheaper than it used to be. The stuff that I’ve written about above isn’t exactly revolutionary: it saves labour but it doesn’t offer new insights that couldn’t have been found before. But later in the project I’m planning to do some text mining which I hope will show me things that I couldn’t otherwise have found. I’ll also be revisiting phonetic algorithms for place name identification. And if I can’t think of anything else to blog about, there are likely to be some interesting stories in the indemnity cases.

Older posts