Wallington’s World! Party time! Excellent!

[posted by Gavin Robinson, 8:30 am, 19 June 2011]

[Just had an exhausting week in the archives but I found this old half-finished post on my hard drive:]

This week [actually last November] I’ve been reading Wallington’s World by Paul Seaver (probably no relation to the unknown stuntman). It’s all about Nehemiah Wallington (not to be confused with Nehemiah Wharton), a mid-seventeenth-century London wood turner who wrote lots of notebooks, some of which have survived. The notebooks are mostly about Wallington’s puritan faith, but they also include lots of incidental details of his life and family. Seaver analysed the surviving books to see what they could tell us about London tradesmen, puritanism and the English Civil War. Today his approach looks quite dated, but maybe that’s not surprising for a book published in 1985. In the introduction there’s a lot about “inward thoughts” and Wallington’s “mental world”. Although there’s no direct mention of Collingwood, his idealism seems to be a big influence on Seaver’s assumptions: that historians can and should find out what people in the past “really” thought. Stephen Greenblatt’s Renaissance Self-Fashioning had already been published five years earlier, but I don’t think it was required reading for historians at this time. Greenblatt discussed the difference between inward and outward selves, but also argued that the very idea of the authentic inner man was constructed through writing. Even writing a private diary is an external act which doesn’t necessarily give us access to the author’s mind. Dan Todman pointed out in The Great War: Myth and Memory that a person’s memories can change every time they’re rehearsed. Therefore the act of writing down our experiences can influence our memories of them rather than just neutrally recording them.

In my forthcoming book I’m trying to get away from worrying about what people “really” thought by concentrating almost entirely on external actions (which includes speech and writing). I’m using horses as a case study to show how material objects and actions could be used to construct parliamentarian identities, arguing that it was actions which made the civil wars happen and that opinions without actions aren’t all that important, even if we could find out about them. Wallington makes an interesting case study here because his writings are all about the theory and practice of puritanism. By traditional definitions he was “a Puritan”. But he doesn’t seem to have done very much to help the parliamentary war effort other than paying his taxes. This was partly because he didn’t have much spare money and partly because he seems to have lacked the confidence and social skills to play an active role, but his writings don’t tend to advocate violent revolution. His puritanism seems to have been orthodox, conservative and introspective. While he criticized the cavaliers, he wrote that parliamentary armies were just as bad, and used phrases like “this uncivil war” and “world turned upside down”. His use of the latter phrase and his criticism of Independents and sectaries are surprisingly similar to John Taylor, whose writings were often conservative and favourable to the King. As Nick at Mercurius Politicus points out, trying to classify writers as royalist or parliamentarian can be tricky and counter-productive. Wallington’s writings also suggest that puritanism wasn’t a straightforward cause of the English Civil War. Although Wallington eventually represented himself as assured of elect status, he never represented himself as God’s instrument in the way that Oliver Cromwell did. He took an obsessive interest in God’s punishment of sinners, but apart from a few passive-aggressive letters to his neighbours he didn’t take much direct action against sinners himself. Wallington’s notebooks make quite a contrast with militant preacher Stephen Marshall’s bloodthirsty sermon Meroz Cursed, in which he insisted that everyone had to fight against the enemies of the true church or be cursed.

[Apparently I was going to write something about gender and sexuality here but I can't remember what. Half my readers will be disappointed and the other half will be relieved!]

Party on, Nehemiah…

  1. S. Greenblatt, Renaissance Self-fashioning: From More to Shakespeare, New edition. (2005).
  2. Stephen Marshall, Meroz cursed, or, A sermon preached to the honourable House of Commons, at their late solemn fast, Febr. 23, 1641 by Stephen Marshall … (London, 1642).
  3. Paul S Seaver, Wallington’s World: A Puritan Artisan in Seventeenth-century London (London, 1985).
  4. Dan Todman, The Great War: Myth and Memory (London, 2007).

Text-mining tips

[posted by Gavin Robinson, 10:27 am, 12 June 2011]

These are some insights from the text-mining that I’ve been doing this week:

Stop and think about stop words

One of the first rules of text-mining should be: always make your own list of stop words. Nothing absolutely and objectively is or isn’t a stop word. Which words are and aren’t meaningful depends on your research questions. For example, pronouns are often included in lists of stop words, but I’m very interested in gender so I want to know the frequencies of gendered words like ‘he’ and ‘she’. If you use someone else’s list without thinking about it you’ll probably inherit various biases and assumptions. The kind of text you’re working with also makes a difference. In the proceedings of parliament words like ‘ordered’, ‘resolved’ and ‘committee’ occur too regularly to be much use to most people. If you don’t define your stop words until after you’ve calculated frequencies for every word you can get a better idea of which words are getting in the way and which ones are interesting.

BeautifulSoup is not always the answer

The Python library BeautifulSoup is really useful for extracting data from HTML pages, but maybe I got into the habit of using it too much. This week I was trying to work out how to get some data from pages that didn’t have a very good semantic structure. Doing it with BeautifulSoup looked like it would be really complicated, but then I realised that in this case regular expressions would be much easier.

Have sets

Python includes a sequence type called a set, which combines the best aspects of a Python sequence and a mathematical set, and is incredibly useful for text-mining scripts. Turning a list into a set automatically gets rid of duplicates. For example, suppose you’ve split some text into a list of separate words.

>>>wordlist = 'it was the best of times it was the worst of times'.split()

>>>wordlist

['it', 'was', 'the', 'best', 'of', 'times', 'it', 'was', 'the', 'worst', 'of', 'times']

>>>wordset = set(wordlist)

>>>wordset

set(['of', 'it', 'times', 'worst', 'the', 'was', 'best'])

Now we have a set of unique words which we can iterate through using a for loop, counting the occurrences of each word in the list:

for word in wordset:
    wordcount = wordlist.count(word)

Then we can do whatever we want with wordcount (print it to the screen, add it to a tuple or a dictionary, write it to a file).

You can also do mathematical operations on sets, which can be really useful for removing stop words.

Suppose we have a set of stopwords:

>>>stopwordset = set(['of', 'it', 'the'])

We can deduct that from the set of words before we iterate through it:

>>>wordset = wordset - stopwordset

>>>wordset

set(['was', 'worst', 'best', 'times'])

Now the stop words in wordlist are completely ignored, and we don’t even have to do an if test at every iteration.

A dictionary is a bit like a database

Python dictionaries can be thought of as very simple databases. Obviously they can’t do everything that a database can do, but you don’t have to worry about connections or cursors either. When counting words across multiple files it’s easy to keep a running total of each word by updating a dictionary at every iteration. If the word is already in the dictionary, add to the existing count; if it isn’t, add a new key/value pair.

This is how I do it:

>>>wordcount = dict()

(Then iterate through each file, open and read it etc.)

for word in wordset:
    if word in wordcount:
        wordcount[word] = wordcount[word] + wordlist.count(word)
    else:
        newword = [(word, wordlist.count(word))]
        wordcount.update(newword)

PhD Theses and The Postmodern Condition

[posted by Gavin Robinson, 7:55 am, 5 June 2011]

A few weeks ago I ordered a PhD thesis from EThOS. A few days later they got back to me to say that the university in question wouldn’t supply the thesis for digitization because they didn’t have the author’s permission. In some ways that was a relief because it saved me the time that I would have used up reading it and the £40 digitization fee. Arguably the university and author have lost more because they’ve just missed out on a citation which would have marginally contributed to their reputation and given their research more ‘impact’. Maybe one consequence of digital history will be that material that isn’t easily available on the web might as well not exist. Which is exactly what Lyotard predicted in The Postmodern Condition in 1979 (p. 4):

The nature of knowledge cannot survive unchanged within this context of general transformation. It can fit into the new channels, and become operational, only if learning is translated into quantities of information. We can predict that anything in the constituted body of knowledge that is not translatable in this way will be abandoned and that the direction of new research will be dictated by the possibility of its eventual results being translatable into computer language. The “producers” and users of knowledge must now, and will have to, possess the means of translating into these languages whatever they want to invent or learn. Research on translating machines is already well advanced. Along with the hegemony of computers comes a certain logic, and therefore a certain set of prescriptions determining which statements are accepted as “knowledge” statements.

We may thus expect a thorough exteriorization of knowledge with respect to the “knower”, at whatever point he or she may occupy in the knowledge process. The old principle that the acquisition of knowledge is indissociable from the training (Bildung) of minds, or even of individuals, is becoming obsolete and will become ever more so.

This book should be one of the foundational texts of digital history, but apparently it isn’t.

  1. Jean-Francois Lyotard, The Postmodern Condition: A Report on Knowledge, trans. G. Bennington and B. Massumi (Manchester, 1984).