<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Investigations of a Dog &#187; copyright</title>
	<atom:link href="http://www.investigations.4-lom.com/tag/copyright/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.investigations.4-lom.com</link>
	<description>Failing better at understanding the past</description>
	<lastBuildDate>Sun, 05 Feb 2012 09:18:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>UK National Archives on Flickr</title>
		<link>http://www.investigations.4-lom.com/2009/07/16/uk-national-archives-on-flickr/</link>
		<comments>http://www.investigations.4-lom.com/2009/07/16/uk-national-archives-on-flickr/#comments</comments>
		<pubDate>Thu, 16 Jul 2009 13:15:30 +0000</pubDate>
		<dc:creator>Gavin Robinson</dc:creator>
				<category><![CDATA[History]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[flickr]]></category>
		<category><![CDATA[online sources]]></category>
		<category><![CDATA[pro]]></category>
		<category><![CDATA[wikis]]></category>
		<category><![CDATA[ww1]]></category>
		<category><![CDATA[your archives]]></category>

		<guid isPermaLink="false">http://www.investigations.4-lom.com/?p=639</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=UK+National+Archives+on+Flickr&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2009-07-16&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2009/07/16/uk-national-archives-on-flickr/&amp;rft.language=English"></span>
There has been some bad news for historians recently: the RHS Bibliography of British and Irish History has lost its direct government funding and is being privatised in a move disturbingly reminiscent of PFI (and to add insult to injury the IHR claims to be &#8220;delighted&#8221; about this!); the UK National Archives (or PRO to [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=UK+National+Archives+on+Flickr&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2009-07-16&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2009/07/16/uk-national-archives-on-flickr/&amp;rft.language=English"></span>
<p>There has been some bad news for historians recently: the <a href="http://www.rhs.ac.uk/bibl/dataset.asp">RHS Bibliography of British and Irish History</a> has lost its direct government funding and is being privatised in a move disturbingly reminiscent of PFI (and to add insult to injury the <a href="http://www.history.ac.uk/news/browse/ihr#bbih">IHR</a> claims to be &#8220;delighted&#8221; about this!); <a href="http://www.nationalarchives.gov.uk/news/stories/325.htm?WT.hp=nf-37377">the UK National Archives</a> (or PRO to most of us who use it) can no longer afford to open on Mondays or offer free parking.</p>
<p>But it&#8217;s not all bad. There&#8217;s also some good news from the National Archives which has got much less attention than the bad news &#8211; in fact I&#8217;m not even sure exactly when it happened. They are now allowing and encouraging users to upload photos of public records held at Kew to Flickr and similar photo sharing sites. Crown Copyright had already been waived to allow republication of the text of public records but previously publishing images of documents didn&#8217;t appear to be allowed. Now it&#8217;s confirmed that uploading images to Flickr <em>is</em> allowed (provided that you&#8217;ve taken them yourself &#8211; this doesn&#8217;t cover documents bought from DocumentsOnline or Ancestry). This is a win situation for everyone, because these documents will be made freely available without it costing the archives anything &#8211; a major advantage when budgets and funding are being cut drastically.</p>
<p>The NA has its own <a rel="nofollow" href="http://www.flickr.com/photos/nationalarchives/">Flickr account</a>, and a <a rel="nofollow" href="http://www.flickr.com/groups/nationalarchives/">group for visitors</a>. Combined with the <a rel="nofollow" href="http://yourarchives.nationalarchives.gov.uk/">Your Archives wiki</a> this could lead to some really exciting stuff. Some people are already using Flickr and Your Archives to publish Metropolitan Police leavers&#8217; registers. The possibilities are endless. I&#8217;m certainly going to upload all the photos I take in the course of my research. To start with I&#8217;ve put up the <a href="http://www.flickr.com/photos/wenham5thlincs/sets/72157621415851961/">service record</a> of my ancestor Tom Wenham from the First World War (photographed from the screen of a microfilm reader).</p>
<p><a title="IMG_0020 by 5th Lincs Wenham, on Flickr" href="http://www.flickr.com/photos/wenham5thlincs/3725564363/"><img src="http://farm4.static.flickr.com/3444/3725564363_0167e0e381_t.jpg" alt="IMG_0020" width="75" height="100" /></a></p>
<p>Still to come are some indemnity cases from SP24, and sooner or later I&#8217;ll have loads of SP28 to share. It would be fantastic if other archives would do this too, although some will probably be too conservative to try it. The British Library <em>still</em> doesn&#8217;t allow digital cameras, which just makes me not want to bother with BL manuscripts.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.investigations.4-lom.com/2009/07/16/uk-national-archives-on-flickr/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>New blog and CSPD online</title>
		<link>http://www.investigations.4-lom.com/2008/04/23/new-blog-and-cspd-online/</link>
		<comments>http://www.investigations.4-lom.com/2008/04/23/new-blog-and-cspd-online/#comments</comments>
		<pubDate>Wed, 23 Apr 2008 09:35:12 +0000</pubDate>
		<dc:creator>Gavin Robinson</dc:creator>
				<category><![CDATA[History]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[online sources]]></category>

		<guid isPermaLink="false">http://www.investigations.4-lom.com/2008/04/23/new-blog-and-cspd-online/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=New+blog+and+CSPD+online&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2008-04-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2008/04/23/new-blog-and-cspd-online/&amp;rft.language=English"></span>
Mercurius Politicus linked to Gilbert Mabbott, a new blog about print culture in the English Civil Wars and Interregnum. From this blog I discovered that Calendar of State Papers Domestic is starting to appear on Google Books. There&#8217;s a James I volume available with full access. I&#8217;m hoping that the rest of the series, particularly [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=New+blog+and+CSPD+online&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2008-04-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2008/04/23/new-blog-and-cspd-online/&amp;rft.language=English"></span>
<p><a href="http://mercuriuspoliticus.wordpress.com/2008/04/22/gilbert-mabbott/">Mercurius Politicus</a> linked to <a href="http://gilbertmabbott.wordpress.com/">Gilbert Mabbott</a>, a new blog about print culture in the English Civil Wars and Interregnum. From this blog I discovered that Calendar of State Papers Domestic is starting to appear on Google Books. There&#8217;s a <a href="http://books.google.co.uk/books?id=3xISAAAAYAAJ">James I volume</a> available with full access. I&#8217;m hoping that the rest of the series, particularly the Charles I volumes, will follow soon. There&#8217;s no reason why they shouldn&#8217;t as they&#8217;re all in the public domain. Since the original documents were under Crown Copyright and the calendars were published by HMSO in the 19th century the copyright must have expired by now.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.investigations.4-lom.com/2008/04/23/new-blog-and-cspd-online/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Digital Express</title>
		<link>http://www.investigations.4-lom.com/2008/02/08/digital-express/</link>
		<comments>http://www.investigations.4-lom.com/2008/02/08/digital-express/#comments</comments>
		<pubDate>Fri, 08 Feb 2008 20:00:56 +0000</pubDate>
		<dc:creator>Gavin Robinson</dc:creator>
				<category><![CDATA[History]]></category>
		<category><![CDATA[5th lincs]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[exhibit api]]></category>
		<category><![CDATA[sandall]]></category>
		<category><![CDATA[tei]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://www.investigations.4-lom.com/2008/02/08/digital-express/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Digital+Express&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2008-02-08&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2008/02/08/digital-express/&amp;rft.language=English"></span>
Having decided to leave my 5th Lincolnshire First World War project for a while, I got an offer I couldn&#8217;t refuse: someone from the Great War Forum sent me a transcript of the battalion&#8217;s medal citations from the regimental archive so that I could publish them on my site and link them in to the [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Digital+Express&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2008-02-08&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2008/02/08/digital-express/&amp;rft.language=English"></span>
<p>Having decided to leave my 5th Lincolnshire First World War project for a while, I got an offer I couldn&#8217;t refuse: someone from the Great War Forum sent me a transcript of the battalion&#8217;s medal citations from the regimental archive so that I could publish them on my site and link them in to the index of people that I&#8217;d created for the book. The document contains information that can&#8217;t be found elsewhere, as although awards of the Military Medal were listed in the London Gazette, full citations were not normally published. There are also three awards not mentioned in Sandall&#8217;s list, and citations for 10 people who were recommended for awards but turned down.</p>
<p>I received the list as a Word file with no semantic markup on Wednesday morning, started working on it on Thursday morning, and <a href="http://www.4-lom.com/citations/">published it on the web</a> this afternoon. It looks very basic but it&#8217;s not bad for two days, and it&#8217;s all linked in to the <a href="http://www.4-lom.com/sandall/people-index.html">index of people</a> for Sandall&#8217;s book. First of all I copied the text into jEdit and used Find and Replace to insert some basic TEI XML markup. Then I pasted it into a new TEI document in oXygen. With the automatic validation it was easy to track down and correct errors in the markup, so by lunch time I had a completely valid TEI file. In the afternoon I spent about 3 or 4 hours on linking records by inserting key attributes into &lt;persName&gt; tags. In most cases I already had the keys that I used for linking names in Sandall, but sometimes I had to change them in the light of new evidence from the citations, such as full names of people who I previously only knew by their initials. This also allowed me to clear up some ambiguities . This morning I finished the linkage by creating new keys for the 13 people not mentioned by Sandall, then got started on writing some XSLT. That was easy as I could copy or adapt a lot of the code from the style sheet for Sandall. As well as generating the HTML version of the citations, this XSLT generates an extra JSON file which is imported into the Sandall index of people to allow linking the citations. Again this only required some minor adjustments to the Exhibit page. After some testing and corrections I had a live site up this afternoon.</p>
<p>This demonstrates the potential value of the techniques I&#8217;ve been using for marking up texts, but it also raises some problems for digital history. I decided to trust a transcript from a random person off the internet. I have no way of knowing how accurate the transcript is, or even if the source document really exists! It could be Hugh Trevor Roper and the &#8220;Hitler Diaries&#8221; all over again. Therefore I&#8217;m going to think more carefully before putting myself in this situation again. There&#8217;s also a possibility that I&#8217;ve miscalculated the copyright situation. Based on internal evidence and comparison with other documents my best guess is that the list was created by the army and is therefore under Crown Copyright (and being unpublished and available for inspection in a public record repository should come under waiver of Crown Copyright), but without seeing the original it&#8217;s hard to be sure. I might be wrong, and even if I&#8217;m right the holders of the manuscript might not agree. So technology makes some things easier, but there are other problems that it can&#8217;t solve.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.investigations.4-lom.com/2008/02/08/digital-express/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TEI Update</title>
		<link>http://www.investigations.4-lom.com/2008/01/02/tei-update/</link>
		<comments>http://www.investigations.4-lom.com/2008/01/02/tei-update/#comments</comments>
		<pubDate>Wed, 02 Jan 2008 11:29:18 +0000</pubDate>
		<dc:creator>Gavin Robinson</dc:creator>
				<category><![CDATA[History]]></category>
		<category><![CDATA[5th lincs]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[sandall]]></category>
		<category><![CDATA[tei]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://www.investigations.4-lom.com/2008/01/02/tei-update/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=TEI+Update&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2008-01-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2008/01/02/tei-update/&amp;rft.language=English"></span>
Nearly a year ago I started a project to digitize T. E. Sandall&#8217;s history of the 1/5th Lincolnshire regiment in the First World War, and in summer I published an interim version. I would&#8217;ve finished it a long time ago if I didn&#8217;t have anything else to do, but original work in peer reviewed printed [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=TEI+Update&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2008-01-02&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2008/01/02/tei-update/&amp;rft.language=English"></span>
<p>Nearly a year ago I started a project to digitize T. E. Sandall&#8217;s history of the 1/5th Lincolnshire regiment in the First World War, and in summer I published an <a href="http://www.4-lom.com/sandall/">interim version</a>. I would&#8217;ve finished it a long time ago if I didn&#8217;t have anything else to do, but original work in peer reviewed printed journals has to come first because it&#8217;ll look better on my CV. Now I&#8217;ve got time to do some more work on it, and having had a break from it I can reassess what I&#8217;m trying to do. Below is an update on what&#8217;s new.</p>
<p><span id="more-163"></span>First of all I discovered from searching the Times Online archive that Major Teall died in 1939, so I&#8217;ll be able to publish his epilogue to the book when it comes into the public domain in 2010. I&#8217;m still not sure about the photos. I&#8217;ve fought my way through the 1988 Copyright Act, which seems to say that copyright in any photos published before 1988 or taken before 1957 exists until it would have expired under the 1956 Copyright Act, which is 50 years after publication for published works, or 50 years after the death of the author for unpublished works. If I&#8217;ve understood correctly that would put the photos from this book (published in 1922) in the public domain. I&#8217;m still going to leave them out for now because it&#8217;s an extra complication, and because they didn&#8217;t scan very well.</p>
<p>Since I last did some work on this project, the new version of the TEI guidelines (<a href="http://www.tei-c.org/Guidelines/P5/">P5</a>) has been released. Not everyone will need to upgrade from P4 to P5, but I want to use some of the new features. There are now elements and attributes designed for linking to page images, which will be very useful. The upgrade was quite easy as most existing elements haven&#8217;t changed much. P5 is based on a schema rather than a DTD. <a href="http://www.oxygenxml.com/">oXygen</a> 9.1 (which I&#8217;ve just bought a licence for &#8211; only $48 for the academic version, and well worth it) comes with TEI P5 schemas, style sheets, and templates pre-loaded. Because I need to use the modules for names and dates, and digital facsimiles, I need to generate my own schema, but this is easy using <a href="http://tei.oucs.ox.ac.uk/Roma/">Roma</a>, a user-friendly online tool.</p>
<p>As well as making the move to P5, I decided to make some other changes to the XML before moving on to the next stage. I removed the original line breaks, which were marked up with &lt;lb&gt; elements, and soft hyphens, which were represented with entity references. This means I&#8217;ve lost something from the original text which won&#8217;t be easy to get back, but I don&#8217;t really have any use for these line breaks and they were causing more trouble than they were worth. Rather than define a new element to represent soft hyphens just in case someone somewhere might want to see the text with original line breaks, I decided to get rid of the problem. Anyone who does want to see the original layout will be able to look at the page images anyway. Removing these elements was easy using Find and Replace with regular expressions in jEdit.</p>
<p>I also decided to get rid of the various elements which were used to represent double quotes. These were &lt;term&gt;, &lt;distinct&gt;, &lt;foreign&gt;, and &lt;soCalled&gt;. I already knew that I wasn&#8217;t going to be using these elements for anything. There are so many possibilities in TEI that it&#8217;s easy to get carried away and start marking up features just because you can. Now I&#8217;ve decided to only use markup which is required by the TEI guidelines, or which has a specific purpose for the resource I&#8217;m trying to create.</p>
<p>My <a href="http://www.investigations.4-lom.com/2007/07/11/unexpected-progress/">first attempt</a> at transforming the text to HTML using the <a href="http://www.tei-c.org/Tools/Stylesheets/">TEI style sheets</a> wasn&#8217;t very promising. I thought then that I just needed to play around with the parameters to make things better, but now, having played around with the parameters quite a lot, I don&#8217;t think it&#8217;s going to be much use. I was always going to have to write some custom XSL to deal with people, places, and dates, but it looks like the TEI XSL won&#8217;t even give me a basic HTML version split into separate files for each chapter. Although it supposedly does this, it looks like it&#8217;s based on the assumption that internal links will use query strings, whereas I just want plain old relative links to plain old HTML files at this stage. It also puts a lot of junk into the HTML whatever parameters you set. For example, if you want an id for each paragraph it doesn&#8217;t add an id attribute to the &lt;p&gt; tag, it inserts an empty anchor at the start of every paragraph! And every &lt;a&gt; tag has an XML namespace declaration, even if you&#8217;ve asked for HTML 4! So I&#8217;ll have to get better at XSL and write my own style sheet from the ground up.</p>
<p>The next step is to generate id and facs attributes based on page numbers for every &lt;pb&gt; element, and put in &lt;ref&gt; tags in the index to point to these page numbers. I also need to decide whether to keep the original table of contents or generate one automatically using XSLT. After that I can either write some XSLT and publish another interim version, or move straight on to marking up people, places, and dates.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.investigations.4-lom.com/2008/01/02/tei-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unexpected Progress</title>
		<link>http://www.investigations.4-lom.com/2007/07/11/unexpected-progress/</link>
		<comments>http://www.investigations.4-lom.com/2007/07/11/unexpected-progress/#comments</comments>
		<pubDate>Wed, 11 Jul 2007 17:35:56 +0000</pubDate>
		<dc:creator>Gavin Robinson</dc:creator>
				<category><![CDATA[History]]></category>
		<category><![CDATA[5th lincs]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[sandall]]></category>
		<category><![CDATA[tei]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://www.investigations.4-lom.com/2007/07/11/unexpected-progress/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Unexpected+Progress&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2007-07-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2007/07/11/unexpected-progress/&amp;rft.language=English"></span>
It&#8217;s been a long time since I wrote anything about my First World War digitization projects, but I now have some progress to report: today I published an interim version of Sandall&#8217;s History of 5th Lincolnshire Regiment. It&#8217;s still a work in progress, and there&#8217;s a lot more to be done, but you can see [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Unexpected+Progress&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2007-07-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2007/07/11/unexpected-progress/&amp;rft.language=English"></span>
<p>It&#8217;s been a long time since I wrote anything about my First World War digitization projects, but I now have some progress to report: today I published an interim version of Sandall&#8217;s History of 5th Lincolnshire Regiment. It&#8217;s still a work in progress, and there&#8217;s a lot more to be done, but you can see it <a href="http://www.4-lom.com/sandall/">here</a>. It&#8217;s just a plain HTML version (and not strictly valid HTML), and the whole text is on one page (at least it makes it easy to search the whole text with your browser&#8217;s Find feature!), there&#8217;s no name linkage yet, no page images online, and no mechanism for submitting corrections. However, even in this form it should be useful to people who are researching the battalion and can&#8217;t get hold of the original book. More details on what I&#8217;ve done and how I&#8217;ve done it below.</p>
<p><span id="more-99"></span>That I haven&#8217;t posted about this project since February might suggest that it&#8217;s been slow and difficult, but actually I was just busy with other things (writing articles, applying for jobs, organising the Military History Carnival). Until then, things were going surprisingly smoothly and quickly. I was hoping that I&#8217;d be able to get everything finished before my Oxygen free trial finished, but because I was doing other things it&#8217;s now expired, and although a licence doesn&#8217;t cost much, I need the money for other things (especially work on peer reviewed things which might get published in a &#8220;proper&#8221; journal and look better on my CV than a self-published digital edition). Fortunately the last thing I did on the project in February was try an XSLT transform to make a test HTML version of the text. Initially this just showed me that I needed to play around with the transform parameters to get what I wanted, but I&#8217;ve now decided that it&#8217;s good enough to form the basis of an interim edition. (In theory I could use the free version of Saxon to do more transforms, but it&#8217;s a scary command line application!)</p>
<p>The HTML file that came out of the transform was over 900K but I cleaned it up using Find/Replace and regular expressions in jEdit, getting it down to 385K. That might still take a while to download if you&#8217;re on dial-up but it&#8217;s not too bad for a whole book. Every paragraph element had been given a name attribute, which isn&#8217;t necessary for this version as only chapters are linked to from the contents, so I stripped them all out. There were also some xmlns attributes which didn&#8217;t appear to be serving any purpose and must have added a huge amount to the file size, and a huge amount of superfluous white space.</p>
<p>As well as getting the file size down, I needed to make some other adjustments. Some of these were down to the transform settings that I&#8217;d used in Oxygen for the test run, but others showed up some possible limitations of TEI/XML. The master XML document preserves the original line breaks and hyphenation. I also left these in during the test transform but in practice an HTML version without line breaks is more useful so I used jEdit to take out all the &lt;br&gt; elements. Here I encountered a potential problem. Following the advice of the TEI guidelines I used the ­ entity ­ to represent soft hyphens. In practice this isn&#8217;t very helpful. If the HTML version is to keep the original line breaks then the soft hyphens need to be converted to hard hyphens so that they display properly, but if the original line breaks aren&#8217;t kept then the hyphens need to be removed along with the breaks. I&#8217;m not sure if XSLT can actually do this. My understanding is that it only deals with XML tags, but I could be wrong there. If it can&#8217;t do anything with entity references then there would be a need for some extra finding and replacing, but there might be anyway depending on how good I can get the output of the XSLT. Maybe it would be better to have an XML element which represents a soft hyphen, or add an attribute to the &lt;lb&gt; elements to indicate that they&#8217;re preceeded by a soft hyphen. If the line breaks and hypens could be dealt with properly by XSLT then it would make sense to not have a space before any &lt;lb&gt; elements in the source XML.</p>
<p>I found some erroneous paragraph breaks in the middle of sentences. These turned out to be mistakes in the master XML file. I&#8217;m not sure how they came about, but they&#8217;re easy to track down and fix with a regexp (just find any &lt;p&gt; element immediately preceeded by a comma or a letter). The XSLT had used &lt;em&gt; tags to highlightwords and phrases marked up with &lt;term&gt; or &lt;distinct&gt; but I changed these to double quotes to match the text. In future (if  I ever digitize another book) I&#8217;m intending to leave in the original quotes rather than marking up their meaning.</p>
<p>It was quite ironic to discover that the XSLT had automatically inserted a copyright statement at the end of the text! (Copyright is more than just a law: it&#8217;s an insidious ideology) I corrected it manually to make it clear that the work is in the public domain. Other manual adjustments included moving the contents to before the preface (more convenient for readers) and adding some CSS to limit the width of the text divs.</p>
<p>So now there&#8217;s a basic version of the text online in a form which should be of some use to at least some people. It&#8217;s certainly an improvement on taking requests on the Great War Forum and posting the excerpts that people ask for. There&#8217;s still a lot to be done but it&#8217;s good to be making some tangible progress again.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.investigations.4-lom.com/2007/07/11/unexpected-progress/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Some Random Things</title>
		<link>http://www.investigations.4-lom.com/2007/06/22/some-random-things/</link>
		<comments>http://www.investigations.4-lom.com/2007/06/22/some-random-things/#comments</comments>
		<pubDate>Fri, 22 Jun 2007 13:07:29 +0000</pubDate>
		<dc:creator>Gavin Robinson</dc:creator>
				<category><![CDATA[History]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[games]]></category>

		<guid isPermaLink="false">http://www.investigations.4-lom.com/2007/06/22/some-random-things/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Some+Random+Things&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2007-06-22&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2007/06/22/some-random-things/&amp;rft.language=English"></span>
The latest early-modern edition of Carnivalesque is up at Blogging The Renaissance. I&#8217;ve turned off the comment timeout plugin, so comments on most old posts are open again, and should stay open as long as they don&#8217;t attract huge amounts of spam. I&#8217;ll be manually closing comments on posts which are getting spammed too much [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Some+Random+Things&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2007-06-22&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2007/06/22/some-random-things/&amp;rft.language=English"></span>
<p>The latest early-modern edition of Carnivalesque is up at <a href="http://bloggingtherenaissance.blogspot.com/2007/06/carnivalesque-28-eebo-edition.html">Blogging The Renaissance</a>.</p>
<p>I&#8217;ve turned off the comment timeout plugin, so comments on most old posts are open again, and should stay open as long as they don&#8217;t attract huge amounts of spam. I&#8217;ll be manually closing comments on posts which are getting spammed too much but I hope most of them will stay open.</p>
<p>Good news: <em>Calendar of State Papers Domestic</em>, one of the most important printed sources for British history, will be available online later this year. Bad News: it&#8217;s a paid subscription service. It remains to be seen how much it costs, but it&#8217;s particularly annoying because the project is funded by a charity, and the material is probably in the public domain, having been published by HMSO more than 50 years ago. More details at the <a href="http://www.history.ac.uk/newsihr.html#cal">IHR website</a>.</p>
<p><a href="http://www.gamebase64.com/game.php?id=679&amp;d=18&amp;h=0">Battle Through Time</a> was a computer game released for the Commodore 64 in 1984. It featured a time travelling car and levels based on World War 1, World War 2, Korea, and Vietnam. Just another example to bring up when lazy journalists say there aren&#8217;t any WW1/Korea games, or that WW2/Vietnam games didn&#8217;t start to be made until this century. And to emphasise the links between cinema and gaming, the background music included &#8220;Suicide Is Painless&#8221; for the Korea level and &#8220;Ride of the Valkyries&#8221; for Vietnam.</p>
<p>And I&#8217;m still looking for Military History Carnival Hosts for September and afterwards. If you&#8217;re interested, <a href="http://www.investigations.4-lom.com/e-mail-me/">e-mail</a> me or leave a comment.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.investigations.4-lom.com/2007/06/22/some-random-things/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Digital History Projects: Progress Report</title>
		<link>http://www.investigations.4-lom.com/2007/01/24/digital-history-progress/</link>
		<comments>http://www.investigations.4-lom.com/2007/01/24/digital-history-progress/#comments</comments>
		<pubDate>Wed, 24 Jan 2007 15:57:01 +0000</pubDate>
		<dc:creator>Gavin Robinson</dc:creator>
				<category><![CDATA[History]]></category>
		<category><![CDATA[5th lincs]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[military history]]></category>
		<category><![CDATA[sandall]]></category>
		<category><![CDATA[ww1]]></category>

		<guid isPermaLink="false">http://www.investigations.4-lom.com/2007/01/24/digital-history-progress/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Digital+History+Projects%3A+Progress+Report&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2007-01-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2007/01/24/digital-history-progress/&amp;rft.language=English"></span>
This is a progress report on the First World War digitization projects I outlined previously in my post on planning. Copyright: I now have definite proof that the work of T. E. Sandall is in the public domain, so I can proceed with digitizing the main text of the book without fear of ending up [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Digital+History+Projects%3A+Progress+Report&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2007-01-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2007/01/24/digital-history-progress/&amp;rft.language=English"></span>
<p>This is a progress report on the First World War digitization projects I outlined previously in my <a href="http://www.investigations.4-lom.com/2007/01/10/digital-history-planning/" title="Investigations of a Dog: Digital History Projects: Planning">post on planning</a>.</p>
<p><span id="more-48"></span></p>
<h3>Copyright:</h3>
<p>I now have definite proof that the work of T. E. Sandall is in the public domain, so I can proceed with digitizing the main text of the book without fear of ending up in court. His death certificate confirms that Thomas Edward Sandall MD, Deputy Commissioner of Medical Services at the Ministry of Pensions, died at 2 Montalt Road, Woodford Green, Essex, on 31st May 1930. A notice of death in The Times carried the same information also noting that he was former CO of 1/5th Lincolnshire Regiment, tying everything together nicely (I already knew that he was a doctor before the war, as well as being a Territorial officer). Therefore the main text of the battalion history has been out of copyright in the UK since 1st January 2001.</p>
<p>I still have no idea when Major George Harris Teall died, and since he was born in 1880 he could potentially have lived into the 1980s. If I can&#8217;t find out when he died, or if his work turns out to be still in copyright, it can easily be omitted. The postscript which he wrote deals only with the reconstitution of the Territorials in 1920, and while that&#8217;s probably an under-researched topic, it&#8217;s also likely to be less interesting to most Great War enthusiasts. The two works are treated separately on the title page and each is the sole work of its respective author, so I don&#8217;t believe that anyone could plausibly claim that the book as a whole is a joint work.</p>
<p>I was going to omit the plates, but there is now a slim possibility that I won&#8217;t have to. Steve Bramley and Chris Bailey, who are writing a new history of the battalion, would also like to use some of the photos in their book, so we&#8217;ve been discussing copyright issues. They might be contacting the publishers of Sandall&#8217;s book (Blackwell, so at least they&#8217;re easy to track down) to see if there is any record of the copyright holders of the illustrations. There might not be any record, or the copyright holders might require a fee for reproduction, but we can still hope that copyright has expired.</p>
<p>My contact with Chris and Steve has also shed new light on the family photos of my great-grandad. We need to look into copyright law in more detail, but it has been suggested that copyright in unpublished photographs created before 1957 expires 70 years after creation, regardless of the life or death of the author. If this turns out to be true, it automatically removes concerns about these photos, which were all taken in or before 1918. If it turns out to be false, it won&#8217;t be a complete disaster. The photos taken at Cottbus are credited to Paul Tharan, the camp photographer. He must have taken thousands of photos of PoWs, some of which turn up on ebay from time to time, but details of his life are surprisingly obscure. So far a request on the Great War forum, usually a mine of esoteric information, hasn&#8217;t produced anything. If I can&#8217;t determine copyright status I&#8217;m still prepared to publish the photos but take them down if a copyright holder appears.</p>
<p>Two of the wartime photos are almost certainly not by Paul Tharan. One has been identified (thanks to Chris Bailey) as the 5th battalion football team taken on the day of their match against the Grimsby Chums at Blundell Park on 22nd October 1914 (<a href="/images/070115/full/wenham_football.jpg" title="Photo of 5th Lincolnshire football team">view photo</a>). The other is of my great-grandad in khaki uniform (<a href="/images/061205/full/001.jpg" title="Portrait of William Wenham in uniform">view photo</a>). It was sent home after he was captured, but I can&#8217;t be sure when or where it was taken. In both cases there are no photographers&#8217; names on the postcards. Although this makes it impossible to prove that they are definitely out of copyright (unless the 1957 rule is true), it also makes it impossible for anyone to prove that they are the copyright holder, so there probably isn&#8217;t much to worry about there.</p>
<h3>Scanning:</h3>
<p>All images have now been captured. Sandall&#8217;s book was scanned in monochrome at 300dpi, two pages at a time. This resulted in 132 images, taking about two to three hours. They are saved as JPEGs which take up about 1.5 to 2.5MB, depending on the amount of text on a page. I&#8217;m probably going to split them into separate files for each page, which won&#8217;t take long using batch processing as the position of the centre of the book is always more or less the same in every scan.</p>
<p>I decided to scan the family letters and photographs in full colour at 600dpi, keeping archive copies as uncompressed TIFFs. Photos can benefit from being scanned at the highest resolution possible, because enlarging them on screen can bring out detail which is invisible to the naked eye on the original prints. At the <a href="http://1914-1918.invisionzone.com/forums/" title="Great War forum">Great War forum</a> there&#8217;s a whole sub-culture devoted to identifying units from internal evidence in photos.</p>
<p>Here are a couple of examples taken from the photos I linked to above. In this one I&#8217;ve enlarged the portrait of my great-grandad in uniform to show that he&#8217;s wearing a pocket watch in his left pocket, with what looks like a small cross hanging from the lanyard:</p>
<p><img src="/images/070124/wenham_detail.jpg" title="Detail showing William Wenham's pocket watch and cross" height="276" width="267" /></p>
<p>This is from the photo of the battalion football team at Blundell Park. You can see a uniformed soldier in the crowd, with what could be a Lincolnshire Regiment cap badge:</p>
<p><img src="/images/070124/football_detail.jpg" title="Detail showing solider of Lincolnshire Regiment at Blundell Park" height="207" width="251" /></p>
<p>The only potential drawback with 600dpi TIFFs is file size: a scan of one side of a postcard is over 20MB! In this case the small size of the collection means there are no problems, but scaling the same technique up to a larger collection might be problematic. I&#8217;m not sure if it was strictly necessary to scan the letters at the same resolution as the photos, as all the writing would be just as legible at 300dpi and blowing it up beyond a certain point doesn&#8217;t really add anything. On the other hand, I found it convenient to keep the same scanner settings and scan the whole collection as one batch. This took about two and a half hours, resulting in 62 images.</p>
<h3>Next Steps:</h3>
<p>I want to do more background reading before I go too much further. I&#8217;ll also be playing around with various software, finding out what it can do and how to use it. And tomorrow I&#8217;ll be spending a whole day away from my computer as I&#8217;m going down to London for a seminar at the IHR on Cromwellian studs (it&#8217;s obvious what I meant).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.investigations.4-lom.com/2007/01/24/digital-history-progress/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Digital History Projects: Planning</title>
		<link>http://www.investigations.4-lom.com/2007/01/10/digital-history-planning/</link>
		<comments>http://www.investigations.4-lom.com/2007/01/10/digital-history-planning/#comments</comments>
		<pubDate>Wed, 10 Jan 2007 19:30:02 +0000</pubDate>
		<dc:creator>Gavin Robinson</dc:creator>
				<category><![CDATA[History]]></category>
		<category><![CDATA[5th lincs]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[tei]]></category>
		<category><![CDATA[theory]]></category>
		<category><![CDATA[ww1]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://www.investigations.4-lom.com/2007/01/10/digital-history-planning/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Digital+History+Projects%3A+Planning&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2007-01-10&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2007/01/10/digital-history-planning/&amp;rft.language=English"></span>
In my New Year post, I mentioned that I&#8217;m thinking about carrying out a couple of digital history projects in connection with my First World War research. These projects are very small and should be relatively easy to carry out on my own, but there will almost certainly be challenges. Overcoming these will give me [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Digital+History+Projects%3A+Planning&amp;rft.aulast=Robinson&amp;rft.aufirst=Gavin&amp;rft.subject=History&amp;rft.source=Investigations+of+a+Dog&amp;rft.date=2007-01-10&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.investigations.4-lom.com/2007/01/10/digital-history-planning/&amp;rft.language=English"></span>
<p>In my New Year post, I mentioned that I&#8217;m thinking about carrying out a couple of digital history projects in connection with my First World War research. These projects are very small and should be relatively easy to carry out on my own, but there will almost certainly be challenges. Overcoming these will give me more experience of carrying out a digital history project (this is starting to sound like a job application again!), and produce useful resources. After that, I can move on to consider some more advanced issues, such as collaborating with other people, and dealing with seventeenth-century manuscripts. To make the experience even more useful, I&#8217;m trying to blog it as I go. This post is an outline of my plans so far. Now that I&#8217;ve published my plans I&#8217;ll have to carry them out!</p>
<p><span id="more-43"></span></p>
<h3>Background:</h3>
<p>Before starting any digital history project, you should read <a href="http://chnm.gmu.edu/digitalhistory/" title="Digital History">Digital History</a> by Dan Cohen and Roy Rosenzweig which is available online free of charge. It&#8217;s an easy to understand introduction which will tell you about different approaches to digital history, costs, benefits, potential problems, and generally help you to think clearly about what you want to achieve and how you can achieve it. The authors ask aspiring digital historians to get the right balance between caution and risk-taking. While a certain amount of planning is always necessary, there is also a risk that academic historians will spend too much time thinking rather than doing.</p>
<p>Having read the book and thought about the potential problems to be overcome, I think I&#8217;m in a good position to put my IT experience to use. I&#8217;ve been running my band&#8217;s website since April 2000, so I&#8217;m a veteran HTML coder. There have been many changes in web technology in that time, and I&#8217;ve always kept up with new developments while being careful not to jump on the latest bandwagon too soon. I have a good knowledge of (X)HTML, CSS, PHP, MySQL, image editing, Apache server management, and web accessibility issues.</p>
<p>I&#8217;ve studied the <a href="http://www.oldbaileyonline.org" title="Old Bailey Proceedings">Old Bailey Proceedings</a>, a large collection of digitized source material marked up with XML, got acquainted with the <a href="http://www.tei-c.org/" title="Text Encoding Initiative">Text Encoding Initiative</a> standards, and have experimented with XML myself. Recently I volunteered my services to <a href="http://www.pgdp.net/c/" title="Distributed Proofreaders">Distributed Proofreaders</a>, and have already proofread over 100 pages of text. I&#8217;ll be continuing with this, and once I&#8217;ve done 300 pages I can get experience of the next stages of the process. I&#8217;ll also be reading <a href="http://www.digitalhumanities.org/companion/" title="A Companion to Digital Humanities">A Companion to Digital Humanities</a>, another free online book.</p>
<p>I&#8217;m breaking my own rule that you should never rely on just one person when creating digital texts, but this is an experiment to see how much I can do on my own. I also want to avoid the added complications of managing a collaborative project. If things go well, I can start to look at ways of working as a team on unofficial and unfunded projects.</p>
<h3>Project Outlines:</h3>
<p>Project Sandall: This project will produce a digital edition of T. E. Sandall, <em>A History of 5th Battalion Lincolnshire Regiment</em> (Oxford, Blackwell, 1922). The text will be marked up with TEI compliant XML and published on the web. It will include a hyperlinked index of people, places, and organizations.</p>
<p>Project Wenham: This project will digitize and publish the correspondence of my great-grandfather, William A. Wenham, relating to his experiences as a prisoner of war during the First World War. The collection consists of a number of letters and postcards sent to his family from prison camps in Germany, with some with photographs. The text will be transcribed and marked up with TEI compliant XML, and published on the web, along with background information written by me. There will be an index of people, and possibly places. Another optional extra will be selections of relevant documents from other sources, such as battalion war diaries.</p>
<h3>Theory:</h3>
<p>One of the great things about digitization projects is that you can avoid some of the more difficult theoretical controversies. The most relevant theory is information theory, and Shannon pointed out that meaning is irrelevant to information. The aim of text digitization and basic markup is to represent the information in the original document as accurately as possible, minimising the noise which can be introduced as part of the process (see my post on <a href="http://www.investigations.4-lom.com/2006/11/21/historical-information-noise/" title="Investigations of a Dog: Historical Information and Noise">Historical Information and Noise</a>). Once the text is published, individual users can decide what it means to them and how far, if at all, it relates to the reality of the past.</p>
<p>However, record linkage brings problems of meaning and epistemology into play. This is likely to be worse in early modern manuscripts which are more noisy to begin with. For example, the same name might be spelt several different ways. Dealing with 20th century sources is easier because of standardized spelling, but linkage still involves taking an epistemological position. One advantage of TEI XML is that it has a built in mechanism for representing uncertainty and quantifying epistemic probabilities using the &lt;certainty&gt; element. Furthermore, XML preserves the original text. Editorial decisions made during the tagging process have little or no impact on the integrity of the text itself. If the XML code is made freely available to users under GPL, they can download the source files and edit them according to their own needs. Therefore, users who don&#8217;t agree with an editorial decision can easily change it in their own local copy of the text.</p>
<h3>Copyright:</h3>
<p>I believe that T. E. Sandall&#8217;s work is out of copyright in most of the world, since the book was published in 1922 and the author died in 1931, but I will be looking for more definite proof before I proceed with the project. The copyright status of the epilogue written by G. H. Teall is unknown, but it can be omitted if I can&#8217;t determine that it&#8217;s definitely out of copyright. The plates are more problematic and will probably be omitted.</p>
<p>William Wenham&#8217;s letters are more straightforward. There is no possibility of anyone outside the family having a claim on the copyright and no possibility of anyone being able to make money out of their publication. The status of the photographs taken at Cottbus is unknown. They might be out of copyright or they might be orphan works which are still under copyright. In this case I&#8217;m prepared to risk publishing them, but will remove them if a genuine copyright holder objects to their publication.</p>
<h3>Image Capture:</h3>
<p>I have a cheap A4 size flatbed scanner which gives adequate quality. The main limitation is that it&#8217;s slow, but the quantity of images to be captured is relatively small. A digital camera can capture images faster, but the quality of text is lower than scanning and would most likely lead to more OCR errors. Sandall will be scanned as 300dpi monochrome JPEGs. There is no need to create a high quality archival copy for preservation purposes. The main aim of the project is to make the text more accessible and add value in the form of XML markup and indexing. The family papers will be scanned as full colour 300dpi TIFFs which will be kept as archival copies. These masters can be used to create 100dpi JPEGs for web publication. It is intended to make images of the documents freely available in addition to the digital text.</p>
<h3>Digitizing Text:</h3>
<p>The scanned images of Sandall will be converted to digital text using OCR software. I have two free OCR packages: IRIS (which came free with a scanner) and Microsoft Office Document Imaging, but both have limited features and would be inadequate for this project. Abbyy Finereader Pro has more advanced features and is recommended by Distributed Proofreaders. For this project I can use the free trial version, and if I want to do more projects in the future I can buy a licence for under £100. PC Pro magazine considers Omnipage to be superior, but it costs around £400! Since no OCR is perfect, it will be necessary to proofread and correct the entire text. Finereader asks for user input when it encounters a doubtful character, and its interface should allow manual proofreading alongside that.</p>
<p>There is no possibility of using OCR for handwritten text, so the letters will have to be transcribed. Although we have the originals to work from, there might be some need to work from digital images. Apart from saving wear and tear on the documents, digital images are more flexible. Difficult text can be enlarged on screen, and contrast can be adjusted to bring out faded text. However, there is the added problem of viewing an image and typing at the same time, which might require specialised software. I&#8217;ve found that Zotero notes can be very useful for transcribing text from images or PDFs and might be adequate to start with. I could also use HTML/PHP/MySQL to cobble together something like the Distributed Proofreaders interface for my own use. The front end is a simple web based interface, and although I don&#8217;t know how their back end works, a local version just for my own use could be much simpler.</p>
<h3>XML Markup:</h3>
<p>XML can be daunting precisely because it&#8217;s so powerful and flexible: where do you start and how far do you go? Using TEI will give me a basic framework which covers the needs of the project and guarantees a certain amount of cross-compatibility. However, there are many different ways to implement TEI compliant XML and many subjective decisions to be made. At its most basic TEI markup need not be much more complex than HTML, but at the other extreme there are provisions for extremely detailed and complex markup, such as stylistic analysis of the structure of sentences. I&#8217;ve decided to split the markup into three phases:</p>
<p>First: mark up the basic structure of the text, such as paragraphs, headings, and page numbers. It might be possible to do some of this automatically, but manual checking and correction will be necessary.</p>
<p>Second: tag names of people, places, and organizations. For the printed text of Sandall it might be possible to automate some of this using regular expressions and Find and Replace functions. Proper nouns can be expected to have capital letters, and ranks and common referents (he, officer, man etc) should be easy to detect. Again manual checking and correction will be necessary. At this stage names will only be tagged to indicate that they are names. No assumptions about identity will be made until the third phase.</p>
<p>Third: link/index records by assigning ID numbers and/or regularized forms to the name tag attributes. At this point some subjective decisions might be necessary, and certainty will have to be evaluated and assigned.</p>
<p>XML tagging can be done in any text editor, but a dedicated XML editor might be useful. I have previously used a free version of Altova XML Spy for experimenting with XML, but some more advanced features might be necessary so I&#8217;ve downloaded a trial version of oXygen. If it proves useful I can buy an academic licence for $48.</p>
<h3>Web Publishing:</h3>
<p>The documents and supporting material will be published in HTML or XHTML. All authoring will be done in XML, and XSLT will be used to transform the source code into other formats. For this reason there is no immediate need to decide whether to use HTML or XHTML, and in practice there is very little difference between XHTML and good HTML (anyone who has problems with the transition to XHTML has written bad HTML to start with!). The XML source files will also be made available so that users can download and manipulate them. Ideally the site would be published in XML only, with XSLT style sheets transforming the code to (X)HTML on the client side, but this would exclude older browsers which can&#8217;t handle XML.</p>
<p>There will be little need for a database back-end as the sites will be small and the content will not be changed or added to frequently. A search engine which recognises XML fields (as at the OBP) would be useful, but as names will be linked to an index page, Google might be adequate for full text searches.</p>
<h3>Publicity:</h3>
<p>This is not much of an issue as the content will only appeal to a niche audience and the projects do not have to justify themselves to funding bodies. The main aim is to develop and demonstrate my digital history skills, so if I get it right it will be a useful demonstration of how to do it. The main avenues for promotion will be my blog, other people&#8217;s blogs, and the <a href="http://1914-1918.invisionzone.com/forums/" title="Great War forum">Great War forum</a>.</p>
<h3>Bibliography</h3>
<ol>
<li>Daniel J. Cohen and Roy Rosenzweig, <span style="font-style:italic;">Digital History: A Guide to Gathering, Preserving, And Presenting the Past on the Web</span> (University of Pennsylvania Press, November 2005). <span class="Z3988" title="url_ver=Z39.88-2004&amp;ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&amp;rft.genre=book&amp;rft.btitle=Digital%20History%3A%20A%20Guide%20to%20Gathering%2C%20Preserving%2C%20And%20Presenting%20the%20Past%20on%20the%20Web&amp;rft.publisher=University%20of%20Pennsylvania%20Press&amp;rft.aufirst=Daniel%20J.&amp;rft.aulast=Cohen&amp;rft.au=Daniel%20J.%20Cohen&amp;rft.au=Roy%20Rosenzweig&amp;rft.date=2005-11-30&amp;rft.pages=316"></span></li>
<li>Thomas Edward Sandall, <span style="font-style:italic;">A History of the 5th Batt. the Lincolnshire Regiment. by Colonel T. E. Sandall, Etc</span> (pp. vi. 221. Basil Blackwell: Oxford, 1922). <span class="Z3988" title="url_ver=Z39.88-2004&amp;ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&amp;rft.genre=book&amp;rft.btitle=A%20History%20of%20the%205th%20Batt.%20the%20Lincolnshire%20Regiment.%20by%20Colonel%20T.%20E.%20Sandall%2C%20Etc&amp;rft.place=pp.%20vi.%20221.%20Basil%20Blackwell%3A%20Oxford%2C%201922&amp;rft.aufirst=Thomas%20Edward&amp;rft.aulast=Sandall&amp;rft.au=Thomas%20Edward%20Sandall&amp;rft.pages=8"></span></li>
<li>Ray Siemens, John Unsworth, and Susan Schreibman, <span style="font-style:italic;">Companion to Digital Humanities (Blackwell Companions to Literature and Culture)</span> (Blackwell Publishing Professional: Oxford, December 2004). <span class="Z3988" title="url_ver=Z39.88-2004&amp;ctx_ver=Z39.88-2004&amp;rft_id=urn%3Aisbn%3A1405103213&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&amp;rft.genre=book&amp;rft.btitle=Companion%20to%20Digital%20Humanities%20(Blackwell%20Companions%20to%20Literature%20and%20Culture)&amp;rft.place=Oxford&amp;rft.publisher=Blackwell%20Publishing%20Professional&amp;rft.edition=Hardcover&amp;rft.series=Blackwell%20Companions%20to%20Literature%20and%20Culture&amp;rft.aufirst=Ray&amp;rft.aulast=Siemens&amp;rft.au=Ray%20Siemens&amp;rft.au=John%20Unsworth&amp;rft.au=Susan%20Schreibman&amp;rft.date=2004-12-12&amp;rft.isbn=1405103213"></span></li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.investigations.4-lom.com/2007/01/10/digital-history-planning/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

