A description of a very simple twitter sentiment analyzer project. This system loads tweets on a user specified topic, scores the sentiment of each word in each tweet, computes an overall sentiment of each tweet, and then tallys the positive and negative sentiments for the topic.

Overview

Twitter has become an international web phenomena where people report their everyday ideas and opinions. Along these lines sentiment analysis of tweets has been seeing a lot of attention lately. There have been articles in Wired Magazine and Bloomberg about using twitter to predict stock market trends. Work by economists at Technische Universitaet Muenchen (TUM, the Technical University of Munich) has even resulted in a website that gives free stock ticker predictions based on twitter. All of these articles really interested me and I think there will be more and more demand for the ability to mine social media data for opinions and sentiments. For these reasons I decided to see if I could make a basic twitter sentiment analyzer with the idea of making it more complex once I master the basics.

For my project I decided to use Ruby to access Twitter’s Search API. It turns out it was extremely easy to use and did not even require any type of registration or authentication. For the sentiment analysis, I used a simple word list that I found online (turns out my project idea was also a class project at UMBC) mapping words to sentiment scores on the range of [-1, 1].

The gist of my basic sentiment analysis algorithm was to gather all the tweets that matched the given search term (Twitter’s Search API pretty much took care of this) and for each tweet take the sum of the sentiment values of the words in the tweet (where the value is 0 if it doesn’t appear in my wordlist). If the sum was greater than some threshold (something like 0.00 or 0.40) then the tweet would be deemed to have positive sentiment. If the sum was less then the negative threshold (something like 0.00 or -0.40) then the tweet would be deemed to have a negative sentiment. Anything else would be deemed neutral. Then once all the tweets have been classified as positive, negative, or neutral you could use the ratio of positive to negative tweets to determine the general sentiment for the search term.

As a rough estimate of sentiment this algorithm works great, mainly because it is so easy to implement. For serious sentiment analysis you would probably want something more complex. This algorithm would do horribly with sarcasm, multi-subject tweets, tweets not expressing explicit opinions (questions for example), etc. Now that I have a basic system to collect and analyze tweets I think I’ll be performing future work to better analyze the sentiment and opinions found in these tweets.

Here is the Ruby code that I used for collecting tweets and performing this basic analysis in addition to the files with word sentiment:

Twitter Sentiment Analysis Code (github) (Mostly Ruby code)