Tag Archives: keywords

The State of the Union Address 2015: “We Are Family…”

Within seconds of a President turning off the autocue, political pundits stop trembling, wipe the drool from their lips, and spend the next 2 years talking incessantly about what was said. A single speech that clocks in at just under 6,500 words can single-handedly generate more web pages than the callipygian [1] Kim Kardashian can generate page clicks. Being a dude, you might think that this post is now about to become an excuse to share a picture of the ample Ms. Kardashian’s gluteus  maximus in all it’s shiny glory – but you’d be wrong! What I’m actually more interested in doing is taking a more detailed look at the vocabulary that Barack Obama used from the basis of corpus linguistics and concordance software. At this point, 90% of the guys who found this post by googling “Kim Kardashian’s ass” will leave. Sorry, dudes.

The data came from a transcript available from Time.com, which I then used as input for WordSmith 6.0 software, a corpus analysis tool. Of the many things this software will let analyze, the ones we’ll look at here are word frequencies, keywords, and concordances.

Keywords are those words that appear in a sample as being used significantly more or less than they are typically used in the general population. In the case of WordSmith, the “general population” is a list know as the British National Corpus, a sample of some 100 million words used in British English (BrE).

The “teachable moment” here is to think about why I chose this sample. Now I know – because I have a ear for these things – that Barack Obama does not use British English; his accent is also a bit of a giveaway. However, for the purpose of this analysis, I don’t think the frequency differences between BrE and American English (AmE) are significant enough to warrant worrying about it. I could have used a different sample called the American National Corpus but that’s only good for 14 million words, which is much smaller than the BNC. Therefore, I chose to go for the larger corpus, knowing that there may be some variations between the two but not, in my opinion, enough to skew the analysis.

Top 25 words by frequency

Fig 1: Top 25 words by frequency

If we take a look at the most frequently used words in the speech, you’ll see that they are pretty much what you might expect on the basis of typical distributions. The word the is the most frequent in the English language and seeing it atop the President’s list is uninteresting. What is interesting is that the pronouns we and our are right up there above I and you. Pronouns regularly score high on frequency lists, and it’s one of the reasons practitioners in the field of Augmentative and Alternative Communication (AAC) should make sure these words are targeted. But the fact that we and our appear so high up the list (at #4 and #8 respectively) made me wonder; is this what we might expect to see in general? And that, my friends, is why we turn to a keyness list.

Top 25 words by keyness

Fig 2: Top 25 words by keyness

Take a look at that keyness column and notice how both we and our are way up there at #2 and #3. Ignoring for now the intricacies of how those keyness figures are calculated [2], what is significant is that the Pres is using those two pronouns significantly more than how anyone else would use them in general, and that reflects a conscious effort to come across as one of “us” and not an “I” or “me” who is doing things. He’s appealing to a “Spirit of Unity.”

You can see more evidence for this appeal if you simply look at the keyness of # 4 and #8 – America and Americans. He’s certainly using the words with more frequency than you’d find in a regular sample but we can perform one more kind of analysis in order to see just how he’s using them; and that’s to create a concordance.

A concordance is a list that shows instances of a word in context, along with the words that go before and after it. Below is a concordance for the word Americans as used alongside our:

Concordance of instances of the words americans and we

Fig 3: Concordance showing WE and AMERICANS

Given that there were 19 instances of the word Americans being used in total, this pairing accounts for over 30% of the use of Americans and we. So as well as using the pronouns themselves to paint a picture of unity, he’s yoking one of them with Americans to further that underlying message.

Casting your eyes just a few more lines down the keyword list you’ll see the words jobs and the economy coming in at #11 and #12, not too far above families (#14) and childcare (#16). Here we see Obama invoking notions of family and economics, both of which are important to voters because we are all involved at some level with both! But take a look at the concordance for how the word family is being used and see if you can spot some familiar words:

Concordance of the word FAMILIES

Fig 4: Concordance of the word FAMILIES

Notice how our and American are also used along with families, further reinforcing that Spirit of Unity. In fact, Obama even makes that relationship between families and the United States in the following few sentences:

“It is amazing,” Rebekah wrote, “what you can bounce back from when you have to…we are a strong, tight-knit family who has made it through some very, very hard times.” We are a strong, tight-knit family who has made it through some very, very hard times. America, Rebekah and Ben’s story is our story.

So not only do we hear this explicit appeal to family but by analyzing the words he uses throughout the speech using keywords and concordances, we can tease out those subliminal nods and pointers toward an underlying message: We are family [3].

Notes
[1] Callipygian is one of my favorite words and, like many of them, deserves to be used much more than it is. The Oxford English Dictionary defines the word as, “of, pertaining to, or having well-shaped or finely developed buttocks,” which in turn comes from the Greek words kalli meaning “beauty” and pygi meaning “buttocks or rump.” Incidentally, an old word for someone who engages in anal intercourse is a pygist, and the adjective dasypygal means “having hairy buttocks.” Try using the last one next time you want to insult folks – especially if they’re making asses of themselves!

[2] So for that one person out there who has less of a life than I have, you basically count the number of times your target word occurs out of a sample of X words in total, then match that against the number of times the same word occurs in your reference corpus of Y words in total. Here’s the word we in a little 2 x 2 box:
Measure of usage of the word WEBecause I always prefer an easy life when it comes to all things numerical, I used an online calculator to take these figures to calculate a “log-likelihood” figure – the “keyness” number. You can find that site here: http://sigil.collocations.de/wizard.html

When the site works its magic, you see the score expressed as G-Squared below:

SOTUA2015 LogLiklihood
Take a look at that G-Squared figure and then look back at the Fig 4 and you’ll see the keyness figure is (almost) the same. You can try this with any of the value in Fig 4 and you’ll see that the online calculator scores match those of the WordSmith software.

[3] It was the end of the 70s and tight spandex leggings were all the rage – for the ladies – and Sister Sledge had a monster hit with “We Are Family” from the album of the same name. Apparently the Sisters are still touring to this very day – although I’m not sure if they’re still wearing spandex.

The Dudes Dissect “Closing the Gap” 2013: Day 1 – Of Words and Workshops

Regular readers of the Speech Dudes will know that when the “Dudes Do…” a conference, Day 1 is typically all about the travel experience, usually including some unfavorable comments about taxi cabs and hotel coffee, but this time I’m feeling charitable and, although not yet ready to “Hug a Cabbie,” I’ve decided to provide an overview of the preconference sessions, which I didn’t attend.
Now, you may think that not having attended a workshop might put me at a bit of a disadvantage with regard to reporting on content and offering a critique – and you would be right. On the other hand, what I can comment on is the contents of the preconference brochure that everyone can have access to prior to the actual event and which they use to decide the workshops and sessions they want to attend.

So what you’re going to see is an example of corpus linguistics in action, dissecting the very words used to influence YOUR choices. In short, you’re about to learn about what words presenters and marketers use to make up your mind for you. Grab your coffee, hold on to your hats, and prepare to be amazed at what you didn’t know!

Methodology

The Dudes are big believers in the scientific method and the application of evidence-based practice. We strive for some objectivity where possible, although we acknowledge that our occasional rants may be just a tad subjective. We don’t expect our readers to take everything we say as gospel sharing the methodology of how we analyzed our data seems fair.

The raw data came straight from the official conference brochure, available for any to check at http://www.closingthegap.com/media/pdfs/conference_brochure.pdf. From that I extracted all the text in the following categories:

  • Preconference Workshop Titles
  • Preconference Workshop Course Descriptions
  • Conference Session Titles
  • Conference Session Descriptions
  • Exhibitor Descriptions

Technically, I simply did cut-and-paste from the PDF and then converted everything to TXT format because that’s the format preferred by the analysis software I use.

WordSmith 6 is a wonderful piece of software that lets you chop up large collections of text and make comparisons against other pieces of text. These comparisons can then show you interesting and fascinating details about how those words are being used. I’ve talked in more detail about WordSmith in our post, The Dudes Do ISAAC 2012 – Of Corpora and Concordances, so take a look at that if you want more details.

Once I have the TXT files, I can create a Word List that gives me frequency data, but I also use a Stop List to filter out common words. If you simply take any large sample of text and count how often words are used, you’ll find that the top 200 end up being the same – that’s what we call Core Vocabulary. And when you’re looking for “interesting” words, you really want to get rid of core because its… well… uninteresting! Hence a Stop List to “stop” those words appearing.[1]

Preconference Workshop Titles

The first opportunity you have to encourage folks to come to your session is to have a title that makes a reader want to find out more about what you have to offer. The title is, in fact, the door to your following content description. Of course, you have to find some balance between “catchy” and “accurate.” For example, a paper I presented at a RESNA (Rehabilitation and Engineering Society of America) conference entitled Semantic Compaction in the Dynamic Environment: Iconic Algebra as an Explanatory Model for the Underlying Process was, in all fairness, technically accurate, but from a marketing perspective it had all the appeal of a dog turd on crepe. [2]

Let’s therefore take a look at what seem to be the best words to use if you want to attract a crowd.

Pre-conference Sessions: Keyword in Titles

High frequency words in Pre-conference titles

High frequency words in preconference titles

The Word Cloud here counts only words that appeared twice or more, and the size of the words is directly proportional to frequency, so it’s clear that students is a critical word to use, followed closely by iPad, technology, learning, and communication. On that basis, if you’re planning to submit a paper for 2014, here’s your best “10-word-title” bet for getting (a) accepted and (b) a crowd:

The implementation of iPad technology for learning and communication

In the event that the CTG review committee find themselves looking at multiple courses submitted with the same title, you’re going to have to consider how you describe your actual course contents – and luckily, we can help there, too!

Preconference Sessions: Keywords in Course Content

The actual highest frequency words were workshop and participants, which is something of an artificial construct because most people include phrases such as “in this workshop, participants will…” and so I removed these from my keyword analysis.

Frequent words in preconference sessions content

Frequent words in preconference sessions content

So to further enhance the pulling power of your course, you need to be talking a lot about students, how they use iPads and communication, along with using apps to learn, enhance learning, and any strategies that help meet needs. In fact, you need to include any of these Top Ten words:

Top Ten keywords in Pre-conference session content

Top Ten keywords in preconference session content

But wait, wait… there’s more

I’ve been using the word keywords to refer to those words that appear within a piece of text more frequently than you would expect based on comparing them to a large normative sample. If you perform  a keyword analysis on the preconference contents sample, you find that the top five keywords that appear are iPad, iPads, AAC, apps, and students. This suggests that we do an awful lot of talking about one, very specific brand name device – which is good news for the marketing department at Apple!

Top 15 words by Keyness score

Top 15 words by Keyness score

The relevant score is the keyness value. The higher the keyness, the more “key” the word is i.e. its frequency in the sample is significantly higher than you would expect to see in the normal population. So when you look at the table above, you’re not just seeing frequency scores but how significantly important words are. [3] As an example, the word iPads is used less frequently than the word communication (10 times as against score 16) but iPads is almost twice as “key” as communication i.e. it is significantly more important.

Now, as a final thought for folks who are working in the field of AAC (augmentative and alternative communication), I suggest that if you are developing vocabulary sets for client groups, using frequency studies is certainly a good start (and more scientific than the tragically common practice of picking the words “someone” thinks are needed) but if you then introduce a keyness analysis, you can improve the effectiveness of your vocabulary selection.

Coming next… The Dudes Dissect Closing The Gap 2013: Day 2 – Of Speech and Session. In which the Dudes present an analysis of the words used to describe conference session titles and contents. Find out how to improve your chances of getting paper presented!

Notes
[1] In truth, there is more I could say about the methodology, and were this intended to be a peer-reviewed article for a prestigious journal, rest assured I’d go into much more detail about some of the finer points. However, this is simply a blog post designed to educate and entertain, so I ask you to allow me some leeway with regard to precision. I’m happy to share the raw data with folks who want to see it but all I ask is you don’t toss it around willy-nilly.

[2] Not only did it have a title that included the word “algebra” but it was scheduled for 8:00 am on the final day (a Saturday, no less) of the conference. Surprisingly, people showed up – which says more about the sort of folks who attend RESNA conferences rather than anything about my “pulling power” as a presenter.

[3] There is a mathematical formula for the calculation of keyness values. One way is to use the Chi-Square statistic; the other is to use a Log-likelihood score, which is something like a Chi-Square on steroids. As I’ve often said, I didn’t become an SLP because of my ability to handle math and statistics, so I admit to finding these things a strain on my brain. However, for the non-statistically inclined among us, the point is that both these measures simply compare the frequency value of a word from an experimental sample against the frequency value it has in a very large comparative sample (such as the British National Corpus or the Corpus of Contemporary American), and then shows you how similar or dissimilar they are. If their frequencies are very, very dissimilar, the word from the experimental sample is a keyword – like iPad and AAC in the examples above. Now feel free to pour yourself a drink and let your brain relax.

Small Object of Desire: The Monteverde Invincia Stylus fountain pen – and Keyword Vocabulary

Those who follow the Speech Dudes on Twitter (@speechdudes) may recall a mysterious tweet from December 28th, 2012, that referred to something called the Monteverde Invincia fountain pen.

Tweet from DecemberAnd those who are regular readers of this blog may vaguely recall that one of the Dudes has a passion for pens that marks him out as being either very old-fashioned, slightly quirky, or perhaps requiring of medication. But the Invincia is a pen of such style, charm, and delicious darkness that I’m guessing at least one of you out there will be ponying up the $75 just to get one of these wonderful objects of desire in your hand. Literally.

Monteverde Invincia Stylus fountain pen

Monteverde Invincia Stylus

But first, because this is, after all, a blog written by SLP’s for other SLP’s, educators, language lovers, and all moms and dads with a curious bent, let’s talk a little bit about vocabulary.

In the field of augmentative and alternative communication, where the Dudes earn their daily crusts, it’s common to talk about words as being either core or fringe. Actually, up until five years ago, it wasn’t always that common but the proliferation of apps for tablets has seen the words core and fringe become almost essential to the marketing blurb of any of these apps – whether or not it’s true. Just tossing the words out doesn’t make an app a good communication tool, nor does copying what other folks have done and dropping it into a few pages make it any better. No, app creators need to learn what the words really mean before using them as sales jargon [1].

But if you are serious about creating a word-based solution, you can use the following definitions to help you in your quest:

Core Word: A word with a high frequency-of-use value that is also what you might expect to see statistically when you compare it to a large reference corpus.

Fringe Word: A word with a low frequency-of-use value that is also what you might expect to see statistically when you compare it to a large reference corpus.

Keyword: A word that has a higher frequency-of-use than what you might statistically expect when you compare it to a large reference corpus.

You’ll notice that I have purposely defined these as statistical phenomena and not as actual words that may be referred to as “useful,” “necessary,” “essential,” “uncommon,” or any other such subjectively nuanced adjectives. You’ve hopefully also picked up on the notion that there needs to be a “reference corpus” of some sort. The best reference corpus I suggest is one I like to call “the English language” because that is the thing that we all need to use in order to communicate with one another. So using the Corpus of Contemporary American or the British National Corpus is fair game. And when it comes to core vocabulary, you’ll find that even if you look at the small vocabulary lists that have been collected in the AAC field from different age group across different situations, you’ll find the same words are common to all [2].

If you’re already working in AAC, you may not be familiar with the use of the term keyword but it’s taken from the world of Corpus Linguistics and I find it a very useful concept to apply. For example, in the world of education, when folks talk about “core words” in relation to Core Communication Standards, they are really talking about keywords; the word vertex is a “core word” in math but is a keyword from an AAC perspective.

Keywords are words which are significantly more frequent in a sample of text than would be expected, given their frequency in a large general reference corpus. (Stubbs, 2010) [3]

So, let’s go back to my encomium [4] on the Monteverde Invicia Stylus pen and see what we can learn about core words, fringe words, and keywords.

The first thing is that the world of pens and paper has specialized vocabulary – or more specifically uses some words in specialized ways. This would be keyword vocabulary within the domain of “Fine Writing.” Thus, the word nib is statistically a fringe word when compared to a general vocabulary but becomes a keyword within the context of discussing fountain pens. In essence, keywords are typically domain-specific items and a sub-set of fringe.

To give you a feel for what keywords you might find, I did a quick(ish) analysis based on a 10,000 word corpus created from a popular blog about fountain pens and their use. Using WordSmith 6 software, I created a word list based on the text from the blog, then used the KeyWord facility to determine the top 2o keywords in the sample i.e. those words that were being used statistically more than you might expect when compared with a standard reference (and in this case, my standard reference is the British National Corpus).

The following “league table” illustrated keyword vocabulary in the domain of Fine Writing.

Keywords "Fine Writing"

Keywords “Fine Writing”

The words fountain and pen appear separately but when you look at the concordance data, the two actually appear typically as fountain pen, so I wouldn’t regard fountain itself a keyword – the keyword is the compound noun, fountain pen.  If I’d taken a few more minutes, I could have put the singular and plural forms together so we wouldn’t see separate entries for pen(s), ink(s), cartridge(s), converter(s), and color(s).

Knowing about such keyword vocabulary is, in fact, very useful. My enthusiasm for my new pen can be explained to you much more succinctly if I can use the keywords. For example, I recommend that if you want one of these pens, you are better off with the medium-sized nib because that will spread the ink out to facilitate clearly writing. Furthermore, since one of the great features of the pen is that it includes both a cartridge and a converter, knowing the words cartridge and converter is helpful! If I then explain that a converter is a small barrel that you can use to suck up ink from an ink bottle, you now know that by buying different inks you can choose which ink colors you’d like to have.

Vocabulary lesson aside, the pen is indeed a stylish addition to anyone’s fashion accessories. Its brushed metal, matt-black finish and fine ribbing give it a distinctive appearance with a hi-tech accent. Its darkness is reinforced by having a shiny black stainless-steel nib, which makes it look like the sort of pen Darth Vader might have used to sign the order authorizing the construction of the Death Star (“You don’t know the power of the Dark Side!”) or that Batman has somewhere on his utility belt (“Quick Robin, use the BatPen!”)

Pen showing internal converter

Pen showing internal converter

It writes smoothly and has the merest hint of a squeak as it glides across paper, which is not a bad thing in the world of fountain pens. It’s classed as a heavy pen (1.4 oz. or 4.0 grams) and so has a much more solid feel than some cheap, plastic ballpoint.

Monteverde Invincia Stylus fountain pen nib

Even the nib is black!

To boost its hi-tech credentials even more, the cap is tipped with conductive rubber so it can be used with a capacitive touchscreen; in short, you can write on your favorite tablet device! I’ve tested it with the Galaxy Tab 7″ display, the 10″ display model (my favorite), the iPad 3 ,  a Motorola Droid 3, Microsoft Surface, and all have worked just fine.

Conductive rubber tip

Conductive rubber tip

There is a white version of the pen available but that doesn’t appeal to me. It’s the blackness that makes it sharp! And with a retail price of $95, it may sound steep to those who are new to the world of fountain pens. But you can get it from Amazon for $75, and other Internet sources are quoting $65, so there are deals to be had.

Long term, there are lots of different inks to choose from. Monteverde offer a range of inks but you should check out Glenn’s Pens where there is a good article on Fountain Pen Ink along with a dizzying array of brands and color options [5]. Another great resource is The Goulet Pen Company, where you’ll also find videos related to pens and paper.

Oh, and it you do buy the pen, drop us a note – then we know who we won’t be able to impress by whipping out our Invicia’s!

Notes
[1] And while we’re at it, there is a special place in the nine circles of Hell (possibly the 8th) reserved for anyone who claims their app is “intuitive,” “ground-breaking,” or, heaven forbid, “game changing.” If it takes me fifteen minutes and four or five keystrokes to find a word like already, and if there is no way for me to actually find it other than hitting key after key after key until  I stumble across it, you have NO right to talk about “intuitive,” “ground breaking,” or “game changing” – unless the “change” in question is to set AAC back 10 years by providing sub-par sops that do nothing more than provide a 10-minute solution that then requires hours and hours of fiddling to add all the stuff that was missing in the first place.

Just sayin’…

[2] If you want a list of a many vocabulary sources, there’s one available via this Dude Link! Link to list of vocabulary articles

[3] Stubbs, M. (2010). Three concepts of keywords. In M. Bondi and M. Scott (Eds.) Keyness in Texts: Studies in Corpus Linguistics. John Benjamins Publishing: Philadelphia. Available via this Dude Link Link to article on keywords

[4] Here’s one of those wonderful words that deserves to be taken out of the box now and again, dusted down, polished up, and tossed into a sentence just to brighten up an otherwise lexically turgid day. The OED defines encomium as “a high-flown expression of praise.” It come, via Latin, from the Greek enkomion (ἐγκώμιον) and ultimately eulogia (εὐλογία) or “eulogy,” which means “praise.” And yes, the logia element does mean “speaking” and is the same root as logos meaning “word.” Only the Dudes would bring you Classical Greek and make it interesting!

[5] My favorite ink at the moment is made by Diamine and called “Syrah,” a splendid dark-red that looks particularly fetching against the ivory paper of my Quo Vadis Havana journal. I use it in my Cross Torero Bourdeaux Croc, which is a broad-nibbed red colored pen that lives in my travel bag.

Cross Torero Croc red fountain pen