Places

[posted by Gavin Robinson, 12:01 pm, 28 January 2008]

Following on from adding an interactive index of people to my digital edition of Sandall’s history of 5th Lincs, I’ve now added a similar feature for place names. It works in exactly the same way as the person index, but it also has a map view. Again this uses the Exhibit API, which makes it very easy to mash up data with Google Maps without even having to know anything about the Google Maps API. The map view is a bit slower than the normal view, especially if the list isn’t filtered, but that’s an inherent limitation of using maps.

One of the many cool things about the map is that it strikingly illustrates the allied advances in the last months of the First World War. If you go into the map view and click “The Beginning of the Great Advance” on the list of chapters, you’ll see the battalion holding the line in Flanders, then moving behind the lines for rest near Amiens, then moving up to the front line at Saint-Quentin. Then click on each of the following chapters in turn and watch the markers surge forward as 46th Division breaks through the Hindenburg Line and pushes towards Belgium.

Adding the place index was mostly similar to adding the person index: I added a unique id to each<placeName> tag using a Python script, pulled out the place names into an SQLite database, identified/disambiguated them and added a regularized name, then used another Python script to pull the regularized names out of the database and put them into the key attributes in the XML file. Identifying the places was easier than identifying people, and took a couple of days, although there are a few that I couldn’t find. As with people I added some code the the XSLT to generate a JSON file of all the places. Then following the map view tutorial I used the Exhibit API to pull latitude and longitude co-ordinates from Google Maps and put them into another JSON file. This turned out to be a bit unreliable as about 10 per cent of the places had their co-ordinates missing. It seems to be random, as running the script again with the same set of data produced a similar error rate but with different places. I had to take the missing places from the output file, put them into another input file and run the script over them again, which produced a similar 10 per cent error rate, but the remaining few co-ordinates could be put in manually. Once I had a JSON file with all the correct geocodes it was easy to copy code from the tutorial to add a map view to the Exhibit page. In a few cases it turned out that Google had given me the wrong co-ordinates. Mostly this was because there are two or more places with the same name and it had picked the wrong one. I thought I’d put in enough information from my manual searches to disambiguate them but it seems that the results of a Google Map search can be a bit unpredictable, and don’t necessarily give you the full address of a place.

I’ve now done most of what I planned to do in this phase. There are still some features that could be added, especially a feedback mechanism, but I’ll be giving this project a rest soon so I can do some English Civil War work.

Marking Up Names: Part 2

[posted by Gavin Robinson, 3:01 pm, 19 January 2008]

My digital edition of Sandall’s History of 1/5th Lincolnshire Regiment now has a new index of people. In my last post I described how names were marked up in the text. This post is about how I linked them together.

(more…)

Marking Up Names: Part 1

[posted by Gavin Robinson, 3:55 pm, 15 January 2008]

On to the next stage of digitizing Sandall’s History of 5th Lincolnshire Regiment. Having marked up the structure of the text and written XSLT to split the book into several HTML pages with working internal links, I could move on to Phase 2: marking up name, dates, and abbreviations.

(more…)

More progress with Sandall

[posted by Gavin Robinson, 3:26 pm, 5 January 2008]

My project to digitize T. E. Sandall’s history of the 1/5th Lincolnshire regiment in the First World War has made very good progress this week. I’ve now uploaded a new HTML version. This features links to page images and a working index: if you click on a page number in the index it takes you to the corresponding part of the text. The whole book is still on one page as I haven’t worked out how to split it yet but it’s an improvement over the previous interim version. Below are more details of what I’ve done and how I’ve done it.

(more…)

New Old Money

[posted by Gavin Robinson, 5:02 pm, 24 December 2007]

In my last post I posted my first attempt at writing Python code to do calculations with pre-decimal currency. With a lot of help from Ben Brumfield I’ve rewritten it so that it now does a lot more with less code. The classes and functions have been completely rearranged, everything is easier to read, and there is more scope for dealing with uncertainty. This is yet another example of the benefits of blogging. Without Ben’s input I’d still be using some pretty mediocre code, but by posting my first attempt on the blog and brainstorming with readers I’ve made a vast improvement in only a couple of days. More details below.

(more…)

Tags: , ,

Comments Off

Old Money

[posted by Gavin Robinson, 5:18 pm, 22 December 2007]

And more adventures with Python programming. One of the trickiest problems in British history is dealing with pre-decimal currency. Until 1971 British currency was a bit strange to say the least. There were 12 pence in a shilling and 20 shillings in a pound (so a pound had 240 pence). This is obviously not something that most off-the-shelf software can deal with, but doing calculations on old money is something that historians need to do quite a lot. During my PhD, when I was using Access databases, I had to decimalize amounts of money before I could do anything with them. That was awkward because some values in the pence column (I seem to remember that 4 and 8 were particularly annoying) gave a recurring fraction. To make things easier I arbitrarily rounded the pence values to the nearest multiple of 3, which meant that my figures were less exact than they could have been, but in practice I could live with it.

These days I can do better. Below are some technical details of how I approached the problem in Python (I like traaaainspotting…).

(more…)

Zotero, XML, Python, and SP28

[posted by Gavin Robinson, 7:43 pm, 20 December 2007]

Since my last post I’ve been doing some more experiments to see how Zotero can be used for cataloguing previously uncatalogued administrative records from the English Civil War. I’ve now put some more of my ideas into practice in demo form and they seem to work. Linking images to Zotero items and adding metadata went very smoothly. The idea of adding extra data by putting XML tags in notes also works, although this is just a stopgap until they implement custom fields. Once you have data in Zotero it’s very easy to export it as XML and do whatever you want with it. More details below, but it gets a bit technical and even includes some sample code (formatting code in WordPress is hard, and it’ll probably screw up the layout for some people). If you’re not A. Nerd and you’re not doing the shopping for your mum you might want to stop reading now.

(more…)

Digital Things

[posted by Gavin Robinson, 8:55 pm, 31 July 2007]

Another not quite proper post – just a round-up of some things I’ve been doing.

The most important thing is that I’ve more or less finished the switch to Zotero. I managed to fix the bug in the MODS translator, which allowed me to import the 1,000+ records (along with associated notes) from my old database without any trouble. That success encouraged me to have a go at writing an Adlib XML translator so I could scrape records from the RHS Bibliography. It’s actually not as difficult as I thought before. I managed to get a working demo but then I gave up because the XML that the RHS site outputs isn’t very good. First, Adlib XML isn’t as detailed as MODS XML, and second, the RHS people don’t seem to have applied the tags very consistently. That means that any records scraped into Zotero would still need quite a bit of manual adjustment. Today I tried getting some new records from the RHS without a scraper, using the links to COPAC and getCopy which appear on most records. Although this was slower than scraping the records directly off the page it worked reasonably well. Books are no problem because they can nearly always be found on COPAC with one click. Journal articles are more hit and miss. Sometimes getCopy leads to a page that Zotero can scrape, sometimes it doesn’t. Essays in collections are the worst as they have to be entered manually. Today’s test was just a simple keyword search for “animals” which only returned 250 results. Over the rest of the week I need to find everything I can about the causes and outbreak of the English Civil War. I already have several hundred ECW related works on file from my PhD, but there will still be a lot of stuff which wasn’t relevant to that which I need to track down now.

Meanwhile over at Early Modern Notes, Sharon noted the death of bookmarks. I still use bookmarks a lot more than some people, but it is true that I’m using them less than I used to. RSS has played a big part in this decline. I use WizzRSS to subscribe to the blogs that I read regularly. Zotero is also taking over from bookmarks as it’s a much more powerful way of keeping track of webpages – you can keep a snapshot of the page (or several snapshots taken at different times), tag it, add it to collections, attach notes, relate it to other items.

I also got excited about the release of CommentPress, a WordPress theme which allows paragraph level comments. One thing I’d like to use this for is putting my PhD thesis online. I could just let people download a PDF, but apart from giving readers the chance to comment on it, I’d like to comment on it myself first! It might also be useful as a feedback mechanism for the digital edition of Sandall’s history of 5th Lincs that I’m working on. I really want some way for readers to be able to suggest corrections, and something like CommentPress would be easier than programming something myself. So I downloaded it to try it out on my local server setup, but I couldn’t get it to work! It might be something to do with Windows, so tomorrow I’ll try to run it on my web host, which is on Linux.

I’ve had more success with Python. Getting to grips with it has been on my to do list for a long time, but I finally got round to downloading and installing the Python interpreter. I haven’t done much with it yet, but it looks like a good language. I used to be prejudiced against it because it doesn’t have curly braces (which are the mark of a “proper” programming language!) but its syntax is actually more concise than that, and nothing like the horrors of Visual Basic. I should be having lots of Python based fun over the coming weeks.

Bibliography Databases

[posted by Gavin Robinson, 10:31 pm, 30 October 2006]

Time to start filling up the “Information Technology” category then. Anyone who isn’t interested in SQL should probably look away now. I’ll be posting some thoughts on Zotero sooner or later, but this post is about my own attempts at making bibliographical databases. I’ve always preferred doing it myself to using off the shelf solutions, which can have advantages and disadvantages.

(more…)

Newer posts