The main impetus behind this project for the Museum of London was to investigate how social media can be collected as an object in its own right, if at all. There is a wider research agenda for capturing contemporary events in the future as they unfold on Twitter. But for the purposes of this post I’m going to focus on simply “can a Museum collect tweets?”
“If so, what do we get or what is the value?” [this has turned into a massive post, so I'll do a second post on "what is the value?"]
I’ve written briefly before on why we chose to focus on Twitter in this post here . To briefly sum up: a large body of Londoners share information on what is going on in the Capital and reflect on this via Twitter. But ultimately, the London 2012 Olympic Games were billed as the first social media Summer Games or the first “Twitter Olympics”. It was expected that athletes, media and the public would Tweet voraciously about the Games. Of particular interest to our project team was the way Twitter would be used by some Londoners to communicate and gather immediate information, feelings and views around the Olympics instantaneously. But at the same time my internal museum curator voice was screaming ” can we collect this?” “and what do we do with it?” “what is the object?”
So rather than turn tail and head for the hills for the duration of August I decided to square up to the challenge and see it as an opportunity to explore if the Museum could collect tweets. There are precedents to collecting Tweets: the Library of Congress do it; and the British Library archive their own Twitter web profile; and there are open source programmes to enable folks, museum communication teams included, to save their Tweets because of the limited expiration date of tweets. But I could not find other museums collecting tweets as objects (please do get in touch if you are out there!)
Indeed the Museum of London had previously worked on a Twitter project with digital artists Thomson and Craighead called London Wall . The object collected by the Museum at the end of the project was the artists’ interpretation of the tweets generated around the geographical site of the Museum and this includes copies of the A4 archive prints of the tweet texts that had been pasted to the foyer wall plus the PDFs used to make the print outs. With #citizencurators I wanted to see if we could get closer to collecting the original or intangible Tweet.
So could we collect tweets?
Yes in a word. We could. We set up our own project hashtag #citizencurators and recruited 16 volunteers who agreed to us harvesting any tweets they tagged with the project hashtag. We also publicised the hashtag and invited people to use it. We could have tried to harvest a trending hashtag but this would pose all sorts of issues in terms of IP in tweets, associated media and capacity.
It turned out that actually harvesting tweets from Twitter with some of the available tools like Archivist is relatively straight forward. But the big question was what about IPR? Archivist and other such programmes included clauses about third party rights in the content and future use thus limiting what we could potentially do with the content in the future.
I then stumbled across the opensource Twitter Archiving Google Spreadsheet (TAGS) formerly called Twitteralytics which was developed by Martin Hawksey. The template uses Google Spreadsheets for the data source and allows you to set up automatic collection of tweets that use the defined terms you set. We set the search criteria around the project hashtag #citizencurators. As the programme is set up with the Twitter API is also harvests the following metadata:
User ID number
Text of tweet
Date stamp time and date
To User ID
Profile Image URL
Using TAGS we harvested c.7000 tweets that used the #citizencurators hashtag. We gathered tweets from approx. 600 unique Twitter user accounts. The numbers are approx. because we have yet to refine the results to just the dates of the Olympics so at the moment the archive includes our pre-project test tweets and post-Olympic come-down tweets.
Here’s a brief synopsis of what we discovered:
Capacity: The programme was able to harvest up to 1500 tweets an hour. To be honest we never expected to exceed this and we didn’t. The tweets were uploading automatically and you could close the Google spreadsheet and the programme worked away in the background, updating the archive with new tweets every time it was opened. The only slight snag we came across was that we reached the maximum number of cells that could be used in one spreadsheet (c.400,000) thus meaning we needed to start a second spreadsheet and merge the two at the end of the project. We could have mitigated against this by not harvesting all of the metadata.
Interaction between users: One of the features of the TAGS programme is that you can use a web interface to re-create the conversations based on the original tweet, reply and retweet. The visualisation allows you to see the interactions between users. To begin with this was really attractive and we hoped to use it as a display in the Museum foyer. But as we were collecting Tweets over the 17 days of the Olympics (gulp!) the visualisation of conversations became too cumbersome to operate and became heavily congested.
The spreadsheet includes all of the metadata around these interactions and will allow data mining to be carried out to further explore the themes and relationships between the tweets and users.
Retweets: RTs make up a large proportion of our tweet archive. From approx. 4100 unique tweets we generated 1913 RTs. One reason for this could be because of the small number of people we recruited to the project using the hashtag who Retweeted each others tweets to their followers which in turn generated more Retweets.
Associated Media: The TAGS spreadsheet includes links to live URLs that take you back to the original tweet and associated media. This is interesting as it does mean you come closer to collecting the original tweet. But at the beginning of the project we acknowledged that any associated media or links to other websites embedded within Tweets using the project hashtag posed third party copyright and IP issues. In order to keep the project manageable it was agreed that the Museum would not collect any associated media e.g. images or links but welcomed the opportunity to explore these issues.
This has turned into a massive blog. Apologies. So I will do a separate one to briefly sum up where my thinking is at with what is the value of all this?