Data and Analytics Resources

Provide Confident Assurance to Your Organization

Analyzing Steve Job's 2005 Commencement Speech with Python and SPSS

by Ahmed Sherif

Jul 13, 2017

We recently celebrated the 10-year anniversary of the iPhone, so I thought it would be a good opportunity to look back at one of the more inspiring speeches given by the iPhone creator, Steve Jobs. He gave this speech a year after he was diagnosed with the cancer that ultimately took his life and it’s one of the few speeches where he opens up about his family life.  Therefore, I thought it to be a good candidate for some text analytics.

Before we can do any analysis, we first have to get the full text of his speech in a usable format.  I found the full text of his speech available on the following website from The Guardian here.

What tools should we use?

Well, python has many great libraries for web scraping text off of the web.  We will use python 3.6 inside of a Jupyter Notebook to retrieve our results.  A word of caution to always read the fine print of a website in regards to their policy for scraping data off of their content.

Image 1

When we view our data retrieval from the variable htmls, we find that it is littered with web syntax that is unnecessary for our analysis, as seen in the following screenshot.

Image 2 
We can see the text from the commencement speech, but it is squeezed between <p>  or paragraph tags.  Thankfully, we can use the BeautifulSoup library to extract only the text from the tags and print it out as seen in the following code.

Image 3

The text now looks much more readable and available in a format for us to begin our text analysis.  The final set of code we will execute in python will be to filter out a few lines of text that are not related to the actual speech (they begin with ‘**’ and are editor comments) and then export the text, line by line, to a MS Excel file that can be read by SPSS.  

Image 4

We can now view our Excel file, Speech.xlsx,  in our directory to confirm that the text from the speech successfully exported.

Image 5

As we can see, our speech exported line by line to a new cell in Excel exactly as we specified it in python with a tab titled Speech and a column header also titled Speech.  We are now ready to begin our text analysis with IBM SPSS.

We will be using SPSS Modeler v18.0.  First we will need to create a new stream and use Excel as our data source from the Sources tab, as seen in the following screenshot.

Image 6

We can then edit the data source and direct it to our Excel output file from python.  To confirm that the file was uploaded to SPSS, we can preview the first ten records as seen in the following screenshot.

Image 7

Next, we will connect a Text Mining node from the IBM Text Analytics tab to the Excel data source. 
Image 8

We can then configure the Text Mining node to point to the Speech column as the Text field from the Excel file under the Fields tab as seen in the following screenshot.

Image 9

We also want to use a specific model for sentiment analysis on our text and that requires selecting the Model tab and loading a specific Text Analysis Package

Image 10

We wish to load the Sentiments package under the English package as seen in the following screenshot.

Image 11

Once the package has been selected, we can click on the Run button to execute the model.  

Image 13

When the model is executed, we can immediately view the different sentiment types by frequency.

Image 14

Phrases of the speech are classified as documents or a Doc.  39% of the documents are falling under the concept of excellent with a positive sentiment type.  On the other hand, the word dropped is in 16% of documents with a NegativeFunctioning type. If we wish to further investigate the negative documents, we would do so by highlighting the <NegativeFunctioning> sentiment type and selecting the Display icon as seen in the following screenshot.

Image 15

Once selected, we can view documents that are associated with NegativeFunctioning sentiments that include dropped as seen in the following screenshot.  

Image 16

We can see that the word dropped is associated with the another negative word, quit.  The sentiment package is sophisticated enough to lump these combination of words together and extract a negative functioning sentiment out of it.

In addition to sentimentality, SPSS modeler has the ability to group words of sentences into broader categories of concepts.  We can perform this function by clicking on the build icon, as seen in the following screenshot.

Image 18

We can see the top categories are Life, Family, and College.  This is to be expected that Steve Jobs would focus on these concepts at a commencement address to a group of graduating students from Stanford.  In addition, we see that he focused on medical procedures, heart, and death.  This also makes sense as he spent much of the speech speaking about his medical condition and how you must live your life each day as if it is your last.

We can expand each category to get deeper insight into the subcategories.  If we expand the Life category, we can view the phrases in his speech related to it.

Image 19

I hope you enjoyed this opportunity to dig through the 2005 commencement speech of Steve Jobs at Stanford University and the different ways we could analyze his speech and categorize the words using IBM SPSS Modeler.

For more insight into Python and SPSS and how to utilize them in your line of business, contact a data and analytics professional at or 813.265.3239.