Pages

Thursday 21 November 2013

Twitter Analysis - Ashes 2013 (England vs Australia)

(May be slow to load, though speeds up when filters are applied)

Tweets related to the Ashes contest between England and Australia have been analysed and tagged by key terms
Select a key term below to filter stats and tweet texts below by that term
(either descriptor or descriptor group and/or participant or participant group)

Filtering can also be done on match days and time of day



Suggested Filtered Views:
  • Select Descriptor '100' to view century makers (in Participant list)
    (Click '100' again for unfiltered view)
  • Select Participant Group 'Officials' (in inset box) to view dominance of negative sentiments (in Descriptor list)
  • Select Descriptor 'Cheat' to see which player was viewed more negatively than officials and read most popular retweets
  • Select Test '1', Day '2' to see which debut player dominated twitter on that day
  • Select Participant 'ICC' to see at what time and day they become the hot topic

Twitter commentary:
Due to limited resources, list of tweet texts is restricted to:
  • 10% of original tweets (randomly selected)
  • All Retweets with a retweet count of at least 50

Best way to quickly view chronological commentary summary is to select from list boxes:
  • 'Stats above & List below...' - 'Retweeted Tweets Only'
  • 'Sort List by' - 'Time'

Notes:
A Python script using oath2 library was written to record twitter stream containing 'Ashes' or 'TheAshes'.
Limitations on streaming and system resources meant not all tweets tweeted were recorded, hence counts of popular retweets will be less than official twitter records

Over 2.35 million in tweets total, and 2.18 million over the playing day periods of the 2013 Ashes series in England were recorded.
A playing day, as categorised in dashboard circle graph, starts at 6 am British Summer Time (3 pm Aust East Time) on play day until 5:59 am BST following day)

Errors in recording streaming data meant no data recorded in the following periods:
  • between 04:05 and 12:43 (BST) on 3/8/13 (3rd test, day 3, 1st session, Eng 1st innings 2/52 to 4/109).
  • between 16:21 and 23:24 (BST) on 23/8/13 (5th test, day 3, 3rd session, Eng 1st innings 3/181 to 4/257).
  • between 04:27 and 06:52 (BST) on 25/8/13 (5th test, day 5, before play).

A Python script transferred text files of tweets into a PostgreSQL database was constructed to store relevant tweet information. Tables and SQL queries were written to:
  • use regular expression functions to mung tweet text punctuation (eg 'Clarke's best hundred. Wonderful!!' replaced with 'clarkes best hundred wonderful')
  • split tweets texts into individual words
  • view frequently occurring words and word pairs
  • create lists of relevant words or word pairs, grouping by common term description
    (eg Participant Term: 'Clarke,M' is tagged to any tweet containing: '#clarke', '@mclark23', 'australian captain', 'australian skipper', 'clark', 'clarke', 'clarkey', 'michael clarke', 'pup')
  • identify words commonly occurring with ashes but not related to cricket (eg scatter in 'scatter my ashes in the ocean') and eliminate tweets containing those words

Descriptor and Participant terms are counted only once per tweet they appear in.

Usage Terms:
Anyone is free to use, and link to dashboard as they wish.
However, permission and an acknowledgement are requested for any use of underlying dataset