Twitter Sentiment Analysis

Abstract

For a long time, only long texts could be parsed through a sentiment analysis algorithm, because smaller texts just had not enough information to get good results. Twitter is a widely used social media platform which consists of very short and most of the time informal written text snippets. Since Twitter messages are always real time, and partly geolocated, a sentiment analysis could attach feelings to places in real time. That is why it became a very interesting problem.

This research is focused on getting streaming data from Twitter(using a determined location) in order to get a sentiment analysis over it. The importance of this now-a-days is that it allows us to gain an overview of the people’s opinion about a specific topic or just to realize how people from an specific place is feeling the most. The applications of sentiment analysis are broad and powerful. The ability to extract insights from social data is a practice that is being widely adopted by organizations across the world.

Method

This sentiment sample was made in Python3.5. To be able to get Twitter streaming data it was necessary to create a Twitter API, in order to get the credentials. Although the tweepy module was used, twython can also be another alternative. To get the sentiment analysis, we used the TextBlob library which provides a simple API for diving into common natural language processing (NLP). For this research, we use its sentiment analysis which gives the polarity of the text, changing from -1 [Negative Attitude] to 1 [Positive Attitude].

The python script is designed to get the streaming data from specific coordinates. The data is saved in a json file, which contains all the information from each tweet, not only the text. In order to do the sentiment analysis we convert the json file into a csv but in this case we filter the information we want to get. It could be only the text(necessary to do the sentiment analysis) but could also get the user, country and some relevant information about the tweet that could be use for other purposes. Finally, we load the desired file in csv format to get the sentiment of each tweet.

 

method

Results and Interpretation

Languages

When looking at the sample streaming data, we notice that even though the tweets were from german locations, not all the tweets were written in german. More than 30 languages were found. The most representatives are:

  • 37.96% tweets in german (de)
  • 29.82% tweets in english (en)
  • 6.52%  tweets undetermined (und)
  • 5.13% tweets in french (fr)
  • 3.67% tweets in dutch, flemish (nl)
  • 2.74% tweets in czech (cs)
  • 2.54% tweets in spanish (es)

languages-graph

This is a downside since TextBlob doesn’t work for all languages and can not be decided which to be used because of the variety.

Sentiment Values

As mentioned above, the sentiment values TextBlob gives as a result goes from [Negative Attitude] -1 to 1 [Positive Attitude].

descriptive-statistics

sentiment-values-graph

  • Neutral Tweets: 79.05%
  • Positive Tweets: 16.46%
  • Negative Tweets: 4.49%

Most of the tweets show a neutral attitude.

Further Research

It is important to notice that sentiment analysis is not a perfect method at all, since the human language is very complex.

Some limitations that were found and could be improved:

Location

In this research a coordinate box was used in order to get the tweets from certain city; however, it is not that precise since coordinates are in a square form. In order to improve this, the coordinates provided in the tweets sample data can be used and imported into a Geo DBMS and a spatial query for the polygon could lead into a more precise location of the city.

Languages

According to the sample data gathered, independently of the country (and the official language of it) you are taking the information from, since we live in a globalized world the tweets you find are written in a variety of languages, which makes not that accurate to use a sentiment analyser tool just in one language. For example, as seen above, most of the tweets showed a neutral attitude; however, it can’t be totally truth since most of the tweets TextBlob cannot process because of the language. Therefore, TextBlob just gives the value 0 giving the feeling that there are more neutral tweets. The implementation of a code which determines the language used to then analyze it based on the result could be a way to improve this.

Urban Mobility Improvement

Nowadays, the most typical application is used to see how people feel about a certain branch or new product a company has just launched. However, that’s not the only use it can have.

Using the streams of tweets from and to a Twitter account of urban transport operators can help improve urban mobility. The analysis of tweets using significant keywords in order to find relevant events such as accidents, sudden traffic jam, service interruption would help  in order to support a more effective travel planning and enable a smarter and personalized urban mobility management.

Likewise, doing a sentiment analysis over those tweets can help evaluate some opinions about the quality of service (delays, inefficiencies, perceived security, etc.) in order to tune the mobility supply to the commuter needs. This information may be used by the transport companies in order to monitor the sentiment about their mobility supply and,  possibly, make effective and efficient modifications to the available options.

Leave a Reply