Data and Analytics Resources

Provide Confident Assurance to Your Organization

Sentiment Analysis Using the Textblob Library in Python

by Chris Hanks

Jan 10, 2018

For many beginners like myself, learning to code for a data science project can be just as intimidating as flying a Boeing 757 for the first time.  Fortunately, Python offers a myriad of powerful libraries to get us beginners quickly into the cockpit to do some cool analyses.  One of such Python libraries is Textblob, which provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and much more.  In this blog, we are going to learn how to write a simple python program that performs sentiment analysis of Martin Luther King, Jr.’s ‘I Have a Dream’ speech and writes the results to a .csv file. 

Before we hit the throttle and leave the runway, I want to give a brief background of myself and how Convergence Consulting Group (CCG) cultivated my interest in data science.  I started at CCG fresh out of college in the summer of 2017.  Having no professional experience in neither consulting nor data science, CCG put me through their consultant boot camp—a rigorous 3-week program that covers the ins and outs of consulting, data warehousing, cloud services, data modeling, advanced analytics, and many other tools and soft-skills that every CCGer needs to succeed.  The experience was immensely useful and enjoyable, but one of the courses that really grabbed my attention was the course on advanced analytics.  That course was my first introduction to python as a data science language.  Our instructor, Ahmed Sherif, gave us real data sets to work with and really pushed us to “think in python”.  By the end of that 2-hour session, I knew that I wanted to do some more cool analyses with python, which led me to write the program I’m about to walk you through.

So let’s get started, shall we?

First, I chose to use Notepad++ to write my code, but feel free to use whatever IDE or text-writing tool you’re comfortable with (FYI Jupyter is great if you’re a beginner).

Notepad 1-6

In lines 4 and 5, we are importing the Textblob and csv libraries.  The former is how we will invoke the NLP sentiment analysis functions.  The latter is how we will invoke the functions necessary to write our sentiment analysis results to a .csv file.   

Notepad 8-10

In line’s 9 and 10, we have declared two file path variables.  File_path is the location of the “I Have a Dream” speech and sentiment_csv_path is the [eventual] location of the sentiment analysis results .csv file. 

Notepad 14-15

In line 15, we create array fieldnames that will be used to populate the headers of our .csv file.  A quick note about each of these 5 headers. Sentence_ID represents a unique identifier number to identify each sentence of the speech. Polarity and Subjectivity represent the respective sentiment analysis scores for each uniquely identified sentence.  Sentence is to be populated with the text from each uniquely identified sentence.  Lastly, Strong Opinion? is to represent a Boolean value (0=F, 1=T) if the program determines a sentence to be a ‘strong opinion’—I’ll elaborate on this shortly.  

Notepad 18-19

In line 19, we declare the sentence_ID variable which is set to 0.  It will later be incremented as the program traverses the text file and identifies each sentence.

Notepad 21-25

In lines 22-25 we do three things.  Frist, we create the .csv file in the file path location stored in sentiment_csv_path.  Then, we write the headers to the .csv file with the array fieldnames.  Then finally, we close the file. 

Notepad 28-30

Lines 29 is where we open the speech file found in the location stored in file_path.   Line 30 read the open speech file through the TextBlob library.

Notepad 32-33

Line 33 takes the open speech file and splits it up into sentences.

Notepad 35-56

Now we get to the meat and potatoes of this program.  Line 38 is the beginning of a for loop that loops through each sentence of the speech file.  Lines 39-40 assign polarity and subjectivity scores to a sentence coming in from the speech file. 

Then in lines 45-48, the program determines if the sentence is a strong opinion based on the polarity and subjectivity scores—If the absolute value of the polarity is greater than 0.7 and the subjectivity is greater than 0.7, then assign the sentence a value of 1 (true)—Otherwise assign a value of 0 (false).

Lines 51-54 open up that .csv file once more, then populate it with data corresponding to each of the 5 headers (Sentence_ID, Polarity, Subjectivity, Sentence, and Strong Opinion?).

In line 56, we reach the end of the loop.  If there is another sentence to be processed, sentence_ID is incremented by 1.  Once there are no more sentences left, the loop completes (breaks).

Notepad 59

Finally, once the program has looped through each sentence of the speech file, analyzed it through textblob, and published the results to a .csv file, we close the file in line 59.  Although you’ll still be able to view the complete results if you skip this step, it is still a programming best practice—especially if you need to access the speech file later in the program via a different library.


Once your program successfully completes, you should be able to open the .csv file (I opened in Excel here) and have all the analysis results for easy consumption.  Now that the results are neatly organized onto a .csv file, there are a multitude of options to store, consume, and creatively visualize this data!

To those of you reading this post who are new to programming and programming for data science—I hope you’ve found this post as helpful and maybe just as inspiring my first exposure to it in CCG’s consultant boot camp. 

CCG was proudly recognized in Gartner’s Market Guide for Data Science and Machine Learning providers, click here to read about this amazing accomplishment. To read more on CCG’s capabilities click here, or contact a CCG representative at (813) 265-3239.