Within seconds of a President turning off the autocue, political pundits stop trembling, wipe the drool from their lips, and spend the next 2 years talking incessantly about what was said. A single speech that clocks in at just under 6,500 words can single-handedly generate more web pages than the callipygian  Kim Kardashian can generate page clicks. Being a dude, you might think that this post is now about to become an excuse to share a picture of the ample Ms. Kardashian’s gluteus maximus in all it’s shiny glory – but you’d be wrong! What I’m actually more interested in doing is taking a more detailed look at the vocabulary that Barack Obama used from the basis of corpus linguistics and concordance software. At this point, 90% of the guys who found this post by googling “Kim Kardashian’s ass” will leave. Sorry, dudes.
The data came from a transcript available from Time.com, which I then used as input for WordSmith 6.0 software, a corpus analysis tool. Of the many things this software will let analyze, the ones we’ll look at here are word frequencies, keywords, and concordances.
Keywords are those words that appear in a sample as being used significantly more or less than they are typically used in the general population. In the case of WordSmith, the “general population” is a list know as the British National Corpus, a sample of some 100 million words used in British English (BrE).
The “teachable moment” here is to think about why I chose this sample. Now I know – because I have a ear for these things – that Barack Obama does not use British English; his accent is also a bit of a giveaway. However, for the purpose of this analysis, I don’t think the frequency differences between BrE and American English (AmE) are significant enough to warrant worrying about it. I could have used a different sample called the American National Corpus but that’s only good for 14 million words, which is much smaller than the BNC. Therefore, I chose to go for the larger corpus, knowing that there may be some variations between the two but not, in my opinion, enough to skew the analysis.
If we take a look at the most frequently used words in the speech, you’ll see that they are pretty much what you might expect on the basis of typical distributions. The word the is the most frequent in the English language and seeing it atop the President’s list is uninteresting. What is interesting is that the pronouns we and our are right up there above I and you. Pronouns regularly score high on frequency lists, and it’s one of the reasons practitioners in the field of Augmentative and Alternative Communication (AAC) should make sure these words are targeted. But the fact that we and our appear so high up the list (at #4 and #8 respectively) made me wonder; is this what we might expect to see in general? And that, my friends, is why we turn to a keyness list.
Take a look at that keyness column and notice how both we and our are way up there at #2 and #3. Ignoring for now the intricacies of how those keyness figures are calculated , what is significant is that the Pres is using those two pronouns significantly more than how anyone else would use them in general, and that reflects a conscious effort to come across as one of “us” and not an “I” or “me” who is doing things. He’s appealing to a “Spirit of Unity.”
You can see more evidence for this appeal if you simply look at the keyness of # 4 and #8 – America and Americans. He’s certainly using the words with more frequency than you’d find in a regular sample but we can perform one more kind of analysis in order to see just how he’s using them; and that’s to create a concordance.
A concordance is a list that shows instances of a word in context, along with the words that go before and after it. Below is a concordance for the word Americans as used alongside our:
Given that there were 19 instances of the word Americans being used in total, this pairing accounts for over 30% of the use of Americans and we. So as well as using the pronouns themselves to paint a picture of unity, he’s yoking one of them with Americans to further that underlying message.
Casting your eyes just a few more lines down the keyword list you’ll see the words jobs and the economy coming in at #11 and #12, not too far above families (#14) and childcare (#16). Here we see Obama invoking notions of family and economics, both of which are important to voters because we are all involved at some level with both! But take a look at the concordance for how the word family is being used and see if you can spot some familiar words:
Notice how our and American are also used along with families, further reinforcing that Spirit of Unity. In fact, Obama even makes that relationship between families and the United States in the following few sentences:
“It is amazing,” Rebekah wrote, “what you can bounce back from when you have to…we are a strong, tight-knit family who has made it through some very, very hard times.” We are a strong, tight-knit family who has made it through some very, very hard times. America, Rebekah and Ben’s story is our story.
So not only do we hear this explicit appeal to family but by analyzing the words he uses throughout the speech using keywords and concordances, we can tease out those subliminal nods and pointers toward an underlying message: We are family .
 Callipygian is one of my favorite words and, like many of them, deserves to be used much more than it is. The Oxford English Dictionary defines the word as, “of, pertaining to, or having well-shaped or finely developed buttocks,” which in turn comes from the Greek words kalli meaning “beauty” and pygi meaning “buttocks or rump.” Incidentally, an old word for someone who engages in anal intercourse is a pygist, and the adjective dasypygal means “having hairy buttocks.” Try using the last one next time you want to insult folks – especially if they’re making asses of themselves!
 So for that one person out there who has less of a life than I have, you basically count the number of times your target word occurs out of a sample of X words in total, then match that against the number of times the same word occurs in your reference corpus of Y words in total. Here’s the word we in a little 2 x 2 box:
Because I always prefer an easy life when it comes to all things numerical, I used an online calculator to take these figures to calculate a “log-likelihood” figure – the “keyness” number. You can find that site here: http://sigil.collocations.de/wizard.html
When the site works its magic, you see the score expressed as G-Squared below:
Take a look at that G-Squared figure and then look back at the Fig 4 and you’ll see the keyness figure is (almost) the same. You can try this with any of the value in Fig 4 and you’ll see that the online calculator scores match those of the WordSmith software.
 It was the end of the 70s and tight spandex leggings were all the rage – for the ladies – and Sister Sledge had a monster hit with “We Are Family” from the album of the same name. Apparently the Sisters are still touring to this very day – although I’m not sure if they’re still wearing spandex.