More progress with Sandall

[posted by Gavin Robinson, 3:26 pm, 5 January 2008]

My project to digitize T. E. Sandall’s history of the 1/5th Lincolnshire regiment in the First World War has made very good progress this week. I’ve now uploaded a new HTML version. This features links to page images and a working index: if you click on a page number in the index it takes you to the corresponding part of the text. The whole book is still on one page as I haven’t worked out how to split it yet but it’s an improvement over the previous interim version. Below are more details of what I’ve done and how I’ve done it.

(more…)

New Old Money

[posted by Gavin Robinson, 5:02 pm, 24 December 2007]

In my last post I posted my first attempt at writing Python code to do calculations with pre-decimal currency. With a lot of help from Ben Brumfield I’ve rewritten it so that it now does a lot more with less code. The classes and functions have been completely rearranged, everything is easier to read, and there is more scope for dealing with uncertainty. This is yet another example of the benefits of blogging. Without Ben’s input I’d still be using some pretty mediocre code, but by posting my first attempt on the blog and brainstorming with readers I’ve made a vast improvement in only a couple of days. More details below.

(more…)

Tags: , ,

Comments Off

Old Money

[posted by Gavin Robinson, 5:18 pm, 22 December 2007]

And more adventures with Python programming. One of the trickiest problems in British history is dealing with pre-decimal currency. Until 1971 British currency was a bit strange to say the least. There were 12 pence in a shilling and 20 shillings in a pound (so a pound had 240 pence). This is obviously not something that most off-the-shelf software can deal with, but doing calculations on old money is something that historians need to do quite a lot. During my PhD, when I was using Access databases, I had to decimalize amounts of money before I could do anything with them. That was awkward because some values in the pence column (I seem to remember that 4 and 8 were particularly annoying) gave a recurring fraction. To make things easier I arbitrarily rounded the pence values to the nearest multiple of 3, which meant that my figures were less exact than they could have been, but in practice I could live with it.

These days I can do better. Below are some technical details of how I approached the problem in Python (I like traaaainspotting…).

(more…)

Zotero, XML, Python, and SP28

[posted by Gavin Robinson, 7:43 pm, 20 December 2007]

Since my last post I’ve been doing some more experiments to see how Zotero can be used for cataloguing previously uncatalogued administrative records from the English Civil War. I’ve now put some more of my ideas into practice in demo form and they seem to work. Linking images to Zotero items and adding metadata went very smoothly. The idea of adding extra data by putting XML tags in notes also works, although this is just a stopgap until they implement custom fields. Once you have data in Zotero it’s very easy to export it as XML and do whatever you want with it. More details below, but it gets a bit technical and even includes some sample code (formatting code in WordPress is hard, and it’ll probably screw up the layout for some people). If you’re not A. Nerd and you’re not doing the shopping for your mum you might want to stop reading now.

(more…)

Digital Things

[posted by Gavin Robinson, 8:55 pm, 31 July 2007]

Another not quite proper post – just a round-up of some things I’ve been doing.

The most important thing is that I’ve more or less finished the switch to Zotero. I managed to fix the bug in the MODS translator, which allowed me to import the 1,000+ records (along with associated notes) from my old database without any trouble. That success encouraged me to have a go at writing an Adlib XML translator so I could scrape records from the RHS Bibliography. It’s actually not as difficult as I thought before. I managed to get a working demo but then I gave up because the XML that the RHS site outputs isn’t very good. First, Adlib XML isn’t as detailed as MODS XML, and second, the RHS people don’t seem to have applied the tags very consistently. That means that any records scraped into Zotero would still need quite a bit of manual adjustment. Today I tried getting some new records from the RHS without a scraper, using the links to COPAC and getCopy which appear on most records. Although this was slower than scraping the records directly off the page it worked reasonably well. Books are no problem because they can nearly always be found on COPAC with one click. Journal articles are more hit and miss. Sometimes getCopy leads to a page that Zotero can scrape, sometimes it doesn’t. Essays in collections are the worst as they have to be entered manually. Today’s test was just a simple keyword search for “animals” which only returned 250 results. Over the rest of the week I need to find everything I can about the causes and outbreak of the English Civil War. I already have several hundred ECW related works on file from my PhD, but there will still be a lot of stuff which wasn’t relevant to that which I need to track down now.

Meanwhile over at Early Modern Notes, Sharon noted the death of bookmarks. I still use bookmarks a lot more than some people, but it is true that I’m using them less than I used to. RSS has played a big part in this decline. I use WizzRSS to subscribe to the blogs that I read regularly. Zotero is also taking over from bookmarks as it’s a much more powerful way of keeping track of webpages – you can keep a snapshot of the page (or several snapshots taken at different times), tag it, add it to collections, attach notes, relate it to other items.

I also got excited about the release of CommentPress, a WordPress theme which allows paragraph level comments. One thing I’d like to use this for is putting my PhD thesis online. I could just let people download a PDF, but apart from giving readers the chance to comment on it, I’d like to comment on it myself first! It might also be useful as a feedback mechanism for the digital edition of Sandall’s history of 5th Lincs that I’m working on. I really want some way for readers to be able to suggest corrections, and something like CommentPress would be easier than programming something myself. So I downloaded it to try it out on my local server setup, but I couldn’t get it to work! It might be something to do with Windows, so tomorrow I’ll try to run it on my web host, which is on Linux.

I’ve had more success with Python. Getting to grips with it has been on my to do list for a long time, but I finally got round to downloading and installing the Python interpreter. I haven’t done much with it yet, but it looks like a good language. I used to be prejudiced against it because it doesn’t have curly braces (which are the mark of a “proper” programming language!) but its syntax is actually more concise than that, and nothing like the horrors of Visual Basic. I should be having lots of Python based fun over the coming weeks.

Newer posts