Sentiment analysis on Twitter data

5 min readFeb 11, 2021

Some of the major observations on public sentiment about the recent Farmers protest in New Delhi, India. And how sentiment analysis on twitter sample data played a key role in understanding the public sentiment.

When someone like me who is not much into studying and understanding the farm laws of India ,it is always a tough task to whether support the farmers protest or not. Most practical approach for this problem is analyzing the public comments.

Here, I used sentiment analysis technique on sample data from twitter(10,000 tweets) for analyzing public sentiments and as a Proof of concept(POC) you can look into the steps followed and the key observations from the sample data .

This blog completely focus on analyzing the tweets and steps followed and trying to find why most of the tweets falling under neutral comments.

Data extraction
Pre-processing
Sentiment analysis
Word cloud for positive , negative and neutral tweets
Why 50% of the crowd is neutral?

1 . Data extraction

Extracting the tweets from millions of tweets on twitter, can be handled by finding the right keywords the user tags.

#FarmersProtest is one of the common keyword the users tagged. Using ‘Tweepy’ , the data is extracted from twitter. The programming language used for this study is python.

2. Tweets Pre-Processing

Noise removal plays a major role in finding a correct sentiment from a tweet. Non-English tweets, and the emoji’s are not considered for this study. NLTK ,Text blob, regular expressions are used here in various stages of pre-processing.

List of pre-processing techniques used:

Lower case conversion.
Remove Unicode strings (Ex: \xe2\x80\xa6).
Remove URLs and https links.
Remove @ symbols and additional white spaces.
Replace alphanumeric symbols with white spaces.
Remove Hashtags in front of the word.
Remove special characters(Ex : {,[,?,!,^,*,:,;).
Remove duplicate tweets.
Remove Non-English words.
Apply spell corrector.

3. Sentiment Analysis

There are multiple libraries to find the sentiment scores. Text blob is used as a part of this study. In Text blob, “Polarity” is the factor which represents the sentiment score in the range of -1 to 1.

-1 represents the Negative sentiment,0 represents neutral and 1 being the positive sentiment.

**Category of tweets based on sentiment scores**

4. Word Cloud for Positive ,Negative and Neutral tweets

Word cloud is a representation of common words used in tweets .With respect to the sentiment scores and categories made, 3 word clouds are made to represent the most common words in each category.

Word frequency graph and word cloud are the easy way of representing your text data to others.

Positive word cloud and word frequency :

The above graph represents, The most common words used by users are “thanks”, “support”, “protest”, “right”, “government” to express there support to the farmers protest.

Negative word cloud and word frequency:

Negative word cloud indicates, “fake”, “democracy”, “propaganda”, “shah”, “inhumane” are most common words used in tweets which have negative sentiment scores.

Neutral word cloud and word frequency:

Most common words observed here are “support”, “thank”, “stand”, “protest”. There are common words in all 3 word clouds. Example, “support” is being used in positive and neutral word clouds, which indicates both positive and neutral tweets contain this word.

There is nothing wrong in getting words like this, as the word meaning changes when it is in a sentence.

5. Why 50% users are neutral in their tweets ?

Coming to this big question, we have a various observations.

Major observations on why they are ‘Neutral’ :

Deviation from the main topic :

As some of the international celebrities came into the picture in support of the farmers, there is a deviation from the main topic to a sub-topic “Do international celebrities involvement correct or not?”, leaving the main topic behind.

Farmers meetings with government :

Most of us don’t want a protest, when you can settle the terms with government. These neutral tweets suggest the same. To be short, it indicates they don’t want any violence to happen and the settlement to happen with farmers.

6. Accuracy of results :

The sentiment analyzer model is not trained as part of interest in this study. “Text Blob” is used as a sentiment analyzer for predicting the scores. As the text blob model is trained on large corpus of text data, this package is used for better results.

Note : This study is a part of understanding and handling the unstructured data. And study is not done in favor of any of the sides in the conflict.

References :

Lectures on “Handling Unstructured data” by Akshay Kulkarni ,REVA university.