Details of Sentiment Analysis

The Classifier

Words Affect the sentiment of a sentance! To determine if a tweet is positive, negative, or neutral, WordAffects uses the Naive Bayes Classifier, a probabilistic classifier that is based off of Bayes' theorem. This approach relies on two assumptions:

1. Position and punctuation have no effect on the sentiment of the tweet
2. A tweet can only be either positive, negative, or neutral, never a combination

Constructing the Classifier

Each search classification compares a tweet to a set of 270,000 tweets where 90,000 are from each sentiment category. The positive tweets were selected because they have a "happy" emoicon such as: :) or :D
The negative tweets had "unhappy" emoticons such as :( or >:(
And the neutral tweets were collected from news sources. Again, the assumption that any tweet with a happy emoticon will always be positive is a simplification, but it will be valid in most cases and saves us time from hand classifying 270,000 tweets.

How accurate?

Both of the above assumptions can be proven false in certain situations. Punctuation and word order can have significant impact on a sentences meaning, and tweets can start positive, but end negative. However Naive Bayes still correctly classifies about 75% of tweets and has simpler, faster computation than other classifiers. Furthermore, in the area of sentiment analysis Naive Bayes has actually outperformed other more complicated classifiers.

Site Features

By signing in, WordAffects allows users to save searches and then track the searches sentiment over time. Every 15 minutes, we search twitter for tweets containg the search term and classify them as possible. What separates WordAffects from other websites that do similar sentiment tracking is the transparency of each serach. Every tweet is labeled as positive, negative or neutral, but due the classifier described above, 80% accuracy is the expected accuracy. However, each tweet comes with a "thumbs up" or "thumbs down" which helps us tweek the classifier and push for greater accuracy.