Tag Archives: corpus linguistics

28 Words to Boost Your Client’s Vocabulary – Maximum Bang for Buck

When developing a vocabulary set for an augmented and alternative communication (AAC) system – or indeed when deciding on what vocabulary to teach anyone – one of the most fundamental of measures you can use is frequency count; how often is a word used in a language? No-one can predict with 100% accuracy which words will be “best” for an individual, but if you’re going to take bets, you’re pretty safe to assume that words such as that, want, stop, and what are going to be used by everyone from ages 2 to 200. By the same token, you’d not be missing much if you didn’t spend too much time on words like ambidextrous, decalogue, and postilion [1].

In the field of AAC, this type of high frequency vocabulary that is used (a) across populations and (b) across situations is referred to as core vocabulary and it’s often contrasted with the phrase fringe vocabulary, which refers to words that are typically (a) low in frequency and (b) specific to isolated activities or situations. For a refresher on core and fringe – and an introduction to keyword vocabulary – check out my article entitled Small Object of Desire: The Monteverde Invincia Stylus fountain pen – and Keyword Vocabulary from two years ago.

The core/fringe distinction is now so embedded in the world of augmentative communication that it is rare to see any new app appear on the market that doesn’t use the phrase “core vocabulary” somewhere in its marketing blurb – even if it isn’t actually making good use of the core! And as core vocabulary is, by definition, common across ages, activities, situations, and pathologies, it’s not surprising that many AAC software offerings look the same, particularly with regard to the words being encoded [2].

But it’s worth taking a look at another level of frequency measurement, and that’s at the phrase level. Specifically, one area of research that seems to me to offer some value to Speech and Language pathologists and Educators working in vocabulary development is in the study of how phrasal verbs (PVs) are distributed.

PV 3

So what’s a phrasal verb? Well, simply put, it’s a phrase of two to three words that are yoked together, which include a verb and a preposition and/or adverb. Examples include, “I ran into Gretchen at the ATIA conference,” “I backed up my hard drive,” and “I came across an interesting article on phrasal verbs.” The English language is stuffed to the gills with these type of verbs, and a feature of them is that they tend to have multiple meanings.

To find out how polysemous a phrase can be, you can use the excellent WordNet online tool, a huge database of words and phrases that let you check out noun, verb, adjective, and adverb meanings. For example, would you believe that the simple phrase “give up” has 12 different meanings? Or that “put down” has 8 variations? It’s not surprising that learners of English find phrasal verbs quite challenging.

The other fascinating feature of phrasal verbs is summarized in a 2007 paper by Gardner and Davies, who point out that of you look at the 100 million word British National Corpus you find that;

…a small subset of 20 lexical verbs combines with eight adverbial particles (160 combinations) to account for more than one half of the 518,923 phrasal verb occurrences identified in the megacorpus. A more specific analysis indicates that only 25 phrasal verbs account for nearly one-third of all phrasal-verb occurrences in the British National Corpus, and 100 phrasal verbs account for more than one half of all such items. Subsequent semantic analyses show that these 100 high-frequency phrasal verb forms have potentially 559 variant meaning senses.

Read that again and see if you get the same tingle I did seeing those numbers. Over half the entire phrasal verbs found in the corpus can be accounted for by combining 20 verbs with 8 particles. In short, if you learn just 28 words, you’ve learned 50% of all the phrasal verbs you’ll need to use.

Let’s take a look at those Top 2o verbs first:

20 most frequent verbs in phrasal verbs

Table 1: Top 20 Verbs in PVs

And now the Top 8 particles:

Eight most frequently used particles in phrasal verbs

Table 2: Top 8 particles in PVs

All the verbs and prepositions as individual items are already high frequency, with the exception of perhaps the verbs point and set, which wouldn’t be on my list of “first words to teach.” However, the real bonus here is that not only do you get the benefit of teaching your client 28 high frequency words in isolation but if you then use them as phrasal verbs, your “bang for buck” is significant!

Here’s a link to a PDF of those 28 words: https://app.box.com/s/vng5hr2tctp87ufdjoyjvyv2ln8300yb

This frequency analysis of phrasal verbs by Gardner and Davies has recently been supported by and extended upon by Dilin Liu (2011) and by Mélodie Garnier and Norbert Schmitt [3] (2014). In their paper, The PHaVE List: A pedagogical list of phrasal verbs and their most frequent meaning senses, they point out that a limitation in Gardner and Davies’ analysis is that they failed to take into account the polysemy inherent in the phrases – like the 12 meanings of “give up.” In fairness to Gardner and Davies, they did, in fact, talk about the polysemous nature of PVs but didn’t offer any measure of the different frequencies with which the various meanings are used. They wrote that:

For instance, the list-high 19 senses of the PV break up … could be arranged from highest to lowest semantic frequency, thus prioritizing them for language learning. We acknowledge, however, that corpora of this nature are much easier talked about than constructed. (p.353).

Garnier and Schmitt are interested not just in identifying the frequency with which a phrasal verb occurs but also the most common senses of those PVs. They say that;

…our main purpose for creating the PHaVE List, which is to reduce the total number of meaning senses to be acquired to a manageable number based on frequency criteria.

On a pragmatic level, they want a learner not to have to learn every meaning of each PV but just focus on the most frequent, and therefore most useful meanings. Using the original list from Gardner and Davies, along with additions by Liu (2011), and including data from the Corpus of Contemporary American English (Davies, 2008), the duo created the PHaVE List; a list of the 150 most frequently used phrasal verbs, and 280 of the most frequently used meanings. So on the 12 potential meanings for “give up,” they use the following:

16. GIVE UP
Stop doing or having something; abandon (activity, belief, possession) (80.5%)
Example: She had to give up smoking when she got pregnant.

The general entry starts with a rank (in this case, 16th out of 150); the basic phrasal verb; a definition; a percentage frequency; and a specific example use. The complete list is made available as a download from the Sage journals website [4]. If you can get access to it, it is well worth the read and the download. And all the articles referenced in this article are good examples of how we can use corpus linguistics to help guide our practice of developing the vocabulary of our clients with language challenges.

References
Davies, M. (2008-). The Corpus of Contemporary American English: 425 million words, 1990-present. Available from Brigham Young University The Corpus of Contemporary America English, from Brigham Young University http://corpus.byu.edu/coca

Gardner, D., & Davies, M. (2007). Pointing Out Frequent Phrasal Verbs: A Corpus-Based Analysis. TESOL Quarterly, 41(2), 339-359.

Garnier, M., & Schmitt, N. (2014). The PHaVE List: A pedagogical list of phrasal verbs and their most frequent meaning senses. Language Teaching Research, 1-22.Published online before print http://ltr.sagepub.com/content/early/2014/12/08/1362168814559798.abstract

Liu, D. (2011). The Most Frequently Used English Phrasal Verbs in American and British English: A Multicorpus Examination. TESOL Quarterly, 45(4), 661-688.

Notes
[1] A postilion is the driver of a horse-drawn carriage, who sits posterior to the horses. The sentence “The postilion has been struck by lightning” is the basis of a wonderful little paper by the linguist David Crystal, published in 1995 in the journal Child Language Teaching & Therapy. Simply titled “Postilion Sentences,” Crystal defines a postilion sentence as “one which has little or no chance of ever being useful in real life. It could be used, obviously, because it is grammatically well-formed; but the contexts in which it would be natural to use it are either so restricted or so adult that the chances of a child encountering it, or finding it necessary to use it, are remote.” In the design of AAC systems, using pre-stored sentences may have some limited value but many “pragmatic utterances” turn out to be nothing more than postilions; unlikely to be used. This is why teaching sentences is neither language nor therapy.

Download Postilion sentences article

Enter a caption

[2] The now-common practice of using core vocabulary also makes it much harder to prove plagiarism – or as we Lancastrians would say, “nicking someone else’s ideas.” People, of course, don’t “steal” ideas – they are “inspired” by the work of others. But such inspiration inevitably leads to systems appearing almost clone-like in their structure. It’s only when you get to the fine details of how words are organized and encoded that you can separate the wheat from the chaff. And there’s a lot of chaff out there.

[3] If I haven’t mentioned it before, Norbert is the author of an excellent book on vocabulary research methods. Here’s the full reference: Schmitt, N. (2010). Researching vocabulary : a vocabulary research manual. Houndmills, Basingstoke, Hampshire ; New York, NY: Palgrave Macmillan. It’s full of useful information and lots of web links worth exploring, and worth the $30 you’ll spend on Amazon US – or the £20.99 in the UK.

[4] Just a reminder to all members of the Royal College of Speech and Language Therapists that you membership benefits includes access to a number of Sage journals online, and Language Teaching Research is one of those. In fact, you have access to over 700 (yes, count ’em!) titles, including my personal favorites Child Language Teaching and Therapy, Clinical Linguistics & Phonetics, English Today, and the riveting Scandinavian Journal of Occupational Therapy. OK, so I lied about the last one being a “favorite” 🙂

The State of the Union Address 2015: “We Are Family…”

Within seconds of a President turning off the autocue, political pundits stop trembling, wipe the drool from their lips, and spend the next 2 years talking incessantly about what was said. A single speech that clocks in at just under 6,500 words can single-handedly generate more web pages than the callipygian [1] Kim Kardashian can generate page clicks. Being a dude, you might think that this post is now about to become an excuse to share a picture of the ample Ms. Kardashian’s gluteus  maximus in all it’s shiny glory – but you’d be wrong! What I’m actually more interested in doing is taking a more detailed look at the vocabulary that Barack Obama used from the basis of corpus linguistics and concordance software. At this point, 90% of the guys who found this post by googling “Kim Kardashian’s ass” will leave. Sorry, dudes.

The data came from a transcript available from Time.com, which I then used as input for WordSmith 6.0 software, a corpus analysis tool. Of the many things this software will let analyze, the ones we’ll look at here are word frequencies, keywords, and concordances.

Keywords are those words that appear in a sample as being used significantly more or less than they are typically used in the general population. In the case of WordSmith, the “general population” is a list know as the British National Corpus, a sample of some 100 million words used in British English (BrE).

The “teachable moment” here is to think about why I chose this sample. Now I know – because I have a ear for these things – that Barack Obama does not use British English; his accent is also a bit of a giveaway. However, for the purpose of this analysis, I don’t think the frequency differences between BrE and American English (AmE) are significant enough to warrant worrying about it. I could have used a different sample called the American National Corpus but that’s only good for 14 million words, which is much smaller than the BNC. Therefore, I chose to go for the larger corpus, knowing that there may be some variations between the two but not, in my opinion, enough to skew the analysis.

Top 25 words by frequency

Fig 1: Top 25 words by frequency

If we take a look at the most frequently used words in the speech, you’ll see that they are pretty much what you might expect on the basis of typical distributions. The word the is the most frequent in the English language and seeing it atop the President’s list is uninteresting. What is interesting is that the pronouns we and our are right up there above I and you. Pronouns regularly score high on frequency lists, and it’s one of the reasons practitioners in the field of Augmentative and Alternative Communication (AAC) should make sure these words are targeted. But the fact that we and our appear so high up the list (at #4 and #8 respectively) made me wonder; is this what we might expect to see in general? And that, my friends, is why we turn to a keyness list.

Top 25 words by keyness

Fig 2: Top 25 words by keyness

Take a look at that keyness column and notice how both we and our are way up there at #2 and #3. Ignoring for now the intricacies of how those keyness figures are calculated [2], what is significant is that the Pres is using those two pronouns significantly more than how anyone else would use them in general, and that reflects a conscious effort to come across as one of “us” and not an “I” or “me” who is doing things. He’s appealing to a “Spirit of Unity.”

You can see more evidence for this appeal if you simply look at the keyness of # 4 and #8 – America and Americans. He’s certainly using the words with more frequency than you’d find in a regular sample but we can perform one more kind of analysis in order to see just how he’s using them; and that’s to create a concordance.

A concordance is a list that shows instances of a word in context, along with the words that go before and after it. Below is a concordance for the word Americans as used alongside our:

Concordance of instances of the words americans and we

Fig 3: Concordance showing WE and AMERICANS

Given that there were 19 instances of the word Americans being used in total, this pairing accounts for over 30% of the use of Americans and we. So as well as using the pronouns themselves to paint a picture of unity, he’s yoking one of them with Americans to further that underlying message.

Casting your eyes just a few more lines down the keyword list you’ll see the words jobs and the economy coming in at #11 and #12, not too far above families (#14) and childcare (#16). Here we see Obama invoking notions of family and economics, both of which are important to voters because we are all involved at some level with both! But take a look at the concordance for how the word family is being used and see if you can spot some familiar words:

Concordance of the word FAMILIES

Fig 4: Concordance of the word FAMILIES

Notice how our and American are also used along with families, further reinforcing that Spirit of Unity. In fact, Obama even makes that relationship between families and the United States in the following few sentences:

“It is amazing,” Rebekah wrote, “what you can bounce back from when you have to…we are a strong, tight-knit family who has made it through some very, very hard times.” We are a strong, tight-knit family who has made it through some very, very hard times. America, Rebekah and Ben’s story is our story.

So not only do we hear this explicit appeal to family but by analyzing the words he uses throughout the speech using keywords and concordances, we can tease out those subliminal nods and pointers toward an underlying message: We are family [3].

Notes
[1] Callipygian is one of my favorite words and, like many of them, deserves to be used much more than it is. The Oxford English Dictionary defines the word as, “of, pertaining to, or having well-shaped or finely developed buttocks,” which in turn comes from the Greek words kalli meaning “beauty” and pygi meaning “buttocks or rump.” Incidentally, an old word for someone who engages in anal intercourse is a pygist, and the adjective dasypygal means “having hairy buttocks.” Try using the last one next time you want to insult folks – especially if they’re making asses of themselves!

[2] So for that one person out there who has less of a life than I have, you basically count the number of times your target word occurs out of a sample of X words in total, then match that against the number of times the same word occurs in your reference corpus of Y words in total. Here’s the word we in a little 2 x 2 box:
Measure of usage of the word WEBecause I always prefer an easy life when it comes to all things numerical, I used an online calculator to take these figures to calculate a “log-likelihood” figure – the “keyness” number. You can find that site here: http://sigil.collocations.de/wizard.html

When the site works its magic, you see the score expressed as G-Squared below:

SOTUA2015 LogLiklihood
Take a look at that G-Squared figure and then look back at the Fig 4 and you’ll see the keyness figure is (almost) the same. You can try this with any of the value in Fig 4 and you’ll see that the online calculator scores match those of the WordSmith software.

[3] It was the end of the 70s and tight spandex leggings were all the rage – for the ladies – and Sister Sledge had a monster hit with “We Are Family” from the album of the same name. Apparently the Sisters are still touring to this very day – although I’m not sure if they’re still wearing spandex.

The Dudes Dissect “Closing the Gap” 2013: Day 2 – Of Speech and Sessions

Having looked at the vocabulary used in the Closing the Gap 2013 preconference sessions, it’s time to cast a lexical eye on the over 200 regular presentations that took place over two-and-a-half days. For most attendees, these are the “bread and butter” of the conference and choosing which to attend is a skill in of itself. It’s not uncommon [1] to have over ten sessions run concurrently, which means you’re only getting to attend a tenth of the conference!

So let’s take a look at the vocabulary used in the titles to all theses presentations to get a flavor of the topics on offer.

Conference Presentations: Titles

The total number of different words used in the session titles was 629 after adjusting for the top 50 words used in English [2]. As a minor deviation, kudos to all who used the word use correctly instead of the irritatingly misused utilize. Only one titled included utilizes – and it was used incorrectly; the rest got it right! For those who are unsure about use versus utilize, the simple rule is to use use and forget about utilize. The less simple rule is to remember that utilize means “to use something in a way in which it was never intended.” So, you use a pencil for drawing while you utilize it for removing wax from your ear; you use an iPad to run an application while you utilize it as a chopping board for vegetables; and you use a hammer to pound nails but utilize it to remove teeth. Diversion over.

Top 20 Most Frequent Words in Titles

Top 20 Most Frequent Words in Titles

Top 20 Most Frequent Words in Titles

No prizes for guessing that the hot topic is using iPad technology in AAC. Your best bet for a 10-word title for next year’s conference is;

How your students  use/access iPad AAC apps as assistive technology

This includes the top 10 of those top 20 words so your chances of getting accepted are high.

Conference Presentations: Content Words

The total word count for the session descriptions text is 2,532 different words (excluding the Stop List), which is a sizable number to play with. And when I say “different words,” I mean that I am basically counting any text string that is different from another as a “word.” So I count use, uses, used, and using as four words, and iPad and iPads as two. A more structured analysis would take such groups and count them as one “item” – or what we call a LEMMA. We’d then have a lemma of <USE> to represent all the different forms of use, which lets us treat use/used/uses/using as one “word” that changes its form depending on the environment in which it is sitting [3]

Top 50 Words By Frequency in Session Content

Top 50 Words By Frequency

A 2,3oo-word graphic would be rather large so I opted to illustrate the top 50 most frequently used words. As you can see, the top words seem to be the same as those in the titles, which suggests that on balance, presenters have done a good job overall in summarizing their presentation contents when creating their titles – something that is actually the strategy you should use.

Keywords in Content

Finally, let’s take a look at the keywords in the session content descriptions. Remember, the keywords are those that appear in a piece of text with a frequency much higher than you would expect in relation to the norm.

Top 10 words by Keyness score

Top 20 words by Keyness score

Top of our list here are apps with the iPad coming in at three. Fortunately this fetish for technology is tempered by the inclusion in our top 20 of words like strategies, learn, how, and skills, all critical parts of developing success in AAC that are extra to the machinery. It’s good to think that folks are remembering that how we teach the use of tools is far, far more important than obsessing over the tools themselves.

Coming next… The Dudes Dissect Closing the Gap: Day 3 – Of Content and Commerce. In which the Dudes look at the marketing blurbs of the Closing the Gap exhibitors to discover what the “hot button” words intended to make you want to buy!

Notes
[1] WordPress’s spell and grammar checker flagged the phrase “it’s not uncommon” as a double negative and told me that I should change it because, “Two negatives in a sentence cancel each other out. Sadly, this fact is not always obvious to your reader. Try rewriting your sentence to emphasize the positive.” Well, although I generally agree that you shouldn’t use no double negatives, the phrase “not uncommon” felt to me to be perfectly OK and not at all unusual. I therefore took a look at the Corpus of Contemporary American English and found that “it’s not uncommon” occurs 313 times while “it’s common” scores 392. This is as near to 50/50 as you get so I suggest to the nice people at WordPress that “it’s not uncommon” is actually quite common and thus quite acceptable – despite it being a technical double negative.

[2] For the curious among you, here are the contents of the Stop List I have been using, which is based on the top 50 most frequently used words in the British National Corpus (BNC): THE, OF, AND, TO, A, IN, THAT, IS, IT, FOR, WAS, ON, I, WITH, AS, BE, HE, YOU, AT, BY, ARE, THIS, HAVE, BUT, NOT, FROM, HAD, HIS, THEY, OR, WHICH, AN, SHE, WERE, HER, ONE, WE, THERE, ALL, BEEN, THEIR, IF, HAS, WILL, SO, NO, WOULD, WHAT, UP, CAN. This is pretty much the same as the top 50 for the Corpus of Contemporary American English, except that the latter includes the words about, do, and said instead of the BNC’s one, so, and their. Statistically, this isn’t significant so I suggest you don’t go losing any sleep over it.

[3] When you create and use lemmas, you also have to take into account that words can have multiple meanings and cross boundaries. In the example of use/used/uses/using, clearly we’re talking about a verb. But when we talk about a user and several users, we are now talking about nouns. So, we don’t have one lemma <USE> for use/used/user/users/uses/using but two lemmas <use(v)> and <use(n)> to mark this difference. It gets even more complicated when you have strings such as lights, which can be a verb in “He lights candles at Christmas” but a noun in “He turns on the lights when it’s dark.” When you do a corpus analysis of text strings, these sort of things are a bugger!

The Dudes Dissect “Closing the Gap” 2013: Day 1 – Of Words and Workshops

Regular readers of the Speech Dudes will know that when the “Dudes Do…” a conference, Day 1 is typically all about the travel experience, usually including some unfavorable comments about taxi cabs and hotel coffee, but this time I’m feeling charitable and, although not yet ready to “Hug a Cabbie,” I’ve decided to provide an overview of the preconference sessions, which I didn’t attend.
Now, you may think that not having attended a workshop might put me at a bit of a disadvantage with regard to reporting on content and offering a critique – and you would be right. On the other hand, what I can comment on is the contents of the preconference brochure that everyone can have access to prior to the actual event and which they use to decide the workshops and sessions they want to attend.

So what you’re going to see is an example of corpus linguistics in action, dissecting the very words used to influence YOUR choices. In short, you’re about to learn about what words presenters and marketers use to make up your mind for you. Grab your coffee, hold on to your hats, and prepare to be amazed at what you didn’t know!

Methodology

The Dudes are big believers in the scientific method and the application of evidence-based practice. We strive for some objectivity where possible, although we acknowledge that our occasional rants may be just a tad subjective. We don’t expect our readers to take everything we say as gospel sharing the methodology of how we analyzed our data seems fair.

The raw data came straight from the official conference brochure, available for any to check at http://www.closingthegap.com/media/pdfs/conference_brochure.pdf. From that I extracted all the text in the following categories:

  • Preconference Workshop Titles
  • Preconference Workshop Course Descriptions
  • Conference Session Titles
  • Conference Session Descriptions
  • Exhibitor Descriptions

Technically, I simply did cut-and-paste from the PDF and then converted everything to TXT format because that’s the format preferred by the analysis software I use.

WordSmith 6 is a wonderful piece of software that lets you chop up large collections of text and make comparisons against other pieces of text. These comparisons can then show you interesting and fascinating details about how those words are being used. I’ve talked in more detail about WordSmith in our post, The Dudes Do ISAAC 2012 – Of Corpora and Concordances, so take a look at that if you want more details.

Once I have the TXT files, I can create a Word List that gives me frequency data, but I also use a Stop List to filter out common words. If you simply take any large sample of text and count how often words are used, you’ll find that the top 200 end up being the same – that’s what we call Core Vocabulary. And when you’re looking for “interesting” words, you really want to get rid of core because its… well… uninteresting! Hence a Stop List to “stop” those words appearing.[1]

Preconference Workshop Titles

The first opportunity you have to encourage folks to come to your session is to have a title that makes a reader want to find out more about what you have to offer. The title is, in fact, the door to your following content description. Of course, you have to find some balance between “catchy” and “accurate.” For example, a paper I presented at a RESNA (Rehabilitation and Engineering Society of America) conference entitled Semantic Compaction in the Dynamic Environment: Iconic Algebra as an Explanatory Model for the Underlying Process was, in all fairness, technically accurate, but from a marketing perspective it had all the appeal of a dog turd on crepe. [2]

Let’s therefore take a look at what seem to be the best words to use if you want to attract a crowd.

Pre-conference Sessions: Keyword in Titles

High frequency words in Pre-conference titles

High frequency words in preconference titles

The Word Cloud here counts only words that appeared twice or more, and the size of the words is directly proportional to frequency, so it’s clear that students is a critical word to use, followed closely by iPad, technology, learning, and communication. On that basis, if you’re planning to submit a paper for 2014, here’s your best “10-word-title” bet for getting (a) accepted and (b) a crowd:

The implementation of iPad technology for learning and communication

In the event that the CTG review committee find themselves looking at multiple courses submitted with the same title, you’re going to have to consider how you describe your actual course contents – and luckily, we can help there, too!

Preconference Sessions: Keywords in Course Content

The actual highest frequency words were workshop and participants, which is something of an artificial construct because most people include phrases such as “in this workshop, participants will…” and so I removed these from my keyword analysis.

Frequent words in preconference sessions content

Frequent words in preconference sessions content

So to further enhance the pulling power of your course, you need to be talking a lot about students, how they use iPads and communication, along with using apps to learn, enhance learning, and any strategies that help meet needs. In fact, you need to include any of these Top Ten words:

Top Ten keywords in Pre-conference session content

Top Ten keywords in preconference session content

But wait, wait… there’s more

I’ve been using the word keywords to refer to those words that appear within a piece of text more frequently than you would expect based on comparing them to a large normative sample. If you perform  a keyword analysis on the preconference contents sample, you find that the top five keywords that appear are iPad, iPads, AAC, apps, and students. This suggests that we do an awful lot of talking about one, very specific brand name device – which is good news for the marketing department at Apple!

Top 15 words by Keyness score

Top 15 words by Keyness score

The relevant score is the keyness value. The higher the keyness, the more “key” the word is i.e. its frequency in the sample is significantly higher than you would expect to see in the normal population. So when you look at the table above, you’re not just seeing frequency scores but how significantly important words are. [3] As an example, the word iPads is used less frequently than the word communication (10 times as against score 16) but iPads is almost twice as “key” as communication i.e. it is significantly more important.

Now, as a final thought for folks who are working in the field of AAC (augmentative and alternative communication), I suggest that if you are developing vocabulary sets for client groups, using frequency studies is certainly a good start (and more scientific than the tragically common practice of picking the words “someone” thinks are needed) but if you then introduce a keyness analysis, you can improve the effectiveness of your vocabulary selection.

Coming next… The Dudes Dissect Closing The Gap 2013: Day 2 – Of Speech and Session. In which the Dudes present an analysis of the words used to describe conference session titles and contents. Find out how to improve your chances of getting paper presented!

Notes
[1] In truth, there is more I could say about the methodology, and were this intended to be a peer-reviewed article for a prestigious journal, rest assured I’d go into much more detail about some of the finer points. However, this is simply a blog post designed to educate and entertain, so I ask you to allow me some leeway with regard to precision. I’m happy to share the raw data with folks who want to see it but all I ask is you don’t toss it around willy-nilly.

[2] Not only did it have a title that included the word “algebra” but it was scheduled for 8:00 am on the final day (a Saturday, no less) of the conference. Surprisingly, people showed up – which says more about the sort of folks who attend RESNA conferences rather than anything about my “pulling power” as a presenter.

[3] There is a mathematical formula for the calculation of keyness values. One way is to use the Chi-Square statistic; the other is to use a Log-likelihood score, which is something like a Chi-Square on steroids. As I’ve often said, I didn’t become an SLP because of my ability to handle math and statistics, so I admit to finding these things a strain on my brain. However, for the non-statistically inclined among us, the point is that both these measures simply compare the frequency value of a word from an experimental sample against the frequency value it has in a very large comparative sample (such as the British National Corpus or the Corpus of Contemporary American), and then shows you how similar or dissimilar they are. If their frequencies are very, very dissimilar, the word from the experimental sample is a keyword – like iPad and AAC in the examples above. Now feel free to pour yourself a drink and let your brain relax.