Tag Archives: Mark Davies

28 Words to Boost Your Client’s Vocabulary – Maximum Bang for Buck

When developing a vocabulary set for an augmented and alternative communication (AAC) system – or indeed when deciding on what vocabulary to teach anyone – one of the most fundamental of measures you can use is frequency count; how often is a word used in a language? No-one can predict with 100% accuracy which words will be “best” for an individual, but if you’re going to take bets, you’re pretty safe to assume that words such as that, want, stop, and what are going to be used by everyone from ages 2 to 200. By the same token, you’d not be missing much if you didn’t spend too much time on words like ambidextrous, decalogue, and postilion [1].

In the field of AAC, this type of high frequency vocabulary that is used (a) across populations and (b) across situations is referred to as core vocabulary and it’s often contrasted with the phrase fringe vocabulary, which refers to words that are typically (a) low in frequency and (b) specific to isolated activities or situations. For a refresher on core and fringe – and an introduction to keyword vocabulary – check out my article entitled Small Object of Desire: The Monteverde Invincia Stylus fountain pen – and Keyword Vocabulary from two years ago.

The core/fringe distinction is now so embedded in the world of augmentative communication that it is rare to see any new app appear on the market that doesn’t use the phrase “core vocabulary” somewhere in its marketing blurb – even if it isn’t actually making good use of the core! And as core vocabulary is, by definition, common across ages, activities, situations, and pathologies, it’s not surprising that many AAC software offerings look the same, particularly with regard to the words being encoded [2].

But it’s worth taking a look at another level of frequency measurement, and that’s at the phrase level. Specifically, one area of research that seems to me to offer some value to Speech and Language pathologists and Educators working in vocabulary development is in the study of how phrasal verbs (PVs) are distributed.

PV 3

So what’s a phrasal verb? Well, simply put, it’s a phrase of two to three words that are yoked together, which include a verb and a preposition and/or adverb. Examples include, “I ran into Gretchen at the ATIA conference,” “I backed up my hard drive,” and “I came across an interesting article on phrasal verbs.” The English language is stuffed to the gills with these type of verbs, and a feature of them is that they tend to have multiple meanings.

To find out how polysemous a phrase can be, you can use the excellent WordNet online tool, a huge database of words and phrases that let you check out noun, verb, adjective, and adverb meanings. For example, would you believe that the simple phrase “give up” has 12 different meanings? Or that “put down” has 8 variations? It’s not surprising that learners of English find phrasal verbs quite challenging.

The other fascinating feature of phrasal verbs is summarized in a 2007 paper by Gardner and Davies, who point out that of you look at the 100 million word British National Corpus you find that;

…a small subset of 20 lexical verbs combines with eight adverbial particles (160 combinations) to account for more than one half of the 518,923 phrasal verb occurrences identified in the megacorpus. A more specific analysis indicates that only 25 phrasal verbs account for nearly one-third of all phrasal-verb occurrences in the British National Corpus, and 100 phrasal verbs account for more than one half of all such items. Subsequent semantic analyses show that these 100 high-frequency phrasal verb forms have potentially 559 variant meaning senses.

Read that again and see if you get the same tingle I did seeing those numbers. Over half the entire phrasal verbs found in the corpus can be accounted for by combining 20 verbs with 8 particles. In short, if you learn just 28 words, you’ve learned 50% of all the phrasal verbs you’ll need to use.

Let’s take a look at those Top 2o verbs first:

20 most frequent verbs in phrasal verbs

Table 1: Top 20 Verbs in PVs

And now the Top 8 particles:

Eight most frequently used particles in phrasal verbs

Table 2: Top 8 particles in PVs

All the verbs and prepositions as individual items are already high frequency, with the exception of perhaps the verbs point and set, which wouldn’t be on my list of “first words to teach.” However, the real bonus here is that not only do you get the benefit of teaching your client 28 high frequency words in isolation but if you then use them as phrasal verbs, your “bang for buck” is significant!

Here’s a link to a PDF of those 28 words: https://app.box.com/s/vng5hr2tctp87ufdjoyjvyv2ln8300yb

This frequency analysis of phrasal verbs by Gardner and Davies has recently been supported by and extended upon by Dilin Liu (2011) and by Mélodie Garnier and Norbert Schmitt [3] (2014). In their paper, The PHaVE List: A pedagogical list of phrasal verbs and their most frequent meaning senses, they point out that a limitation in Gardner and Davies’ analysis is that they failed to take into account the polysemy inherent in the phrases – like the 12 meanings of “give up.” In fairness to Gardner and Davies, they did, in fact, talk about the polysemous nature of PVs but didn’t offer any measure of the different frequencies with which the various meanings are used. They wrote that:

For instance, the list-high 19 senses of the PV break up … could be arranged from highest to lowest semantic frequency, thus prioritizing them for language learning. We acknowledge, however, that corpora of this nature are much easier talked about than constructed. (p.353).

Garnier and Schmitt are interested not just in identifying the frequency with which a phrasal verb occurs but also the most common senses of those PVs. They say that;

…our main purpose for creating the PHaVE List, which is to reduce the total number of meaning senses to be acquired to a manageable number based on frequency criteria.

On a pragmatic level, they want a learner not to have to learn every meaning of each PV but just focus on the most frequent, and therefore most useful meanings. Using the original list from Gardner and Davies, along with additions by Liu (2011), and including data from the Corpus of Contemporary American English (Davies, 2008), the duo created the PHaVE List; a list of the 150 most frequently used phrasal verbs, and 280 of the most frequently used meanings. So on the 12 potential meanings for “give up,” they use the following:

Stop doing or having something; abandon (activity, belief, possession) (80.5%)
Example: She had to give up smoking when she got pregnant.

The general entry starts with a rank (in this case, 16th out of 150); the basic phrasal verb; a definition; a percentage frequency; and a specific example use. The complete list is made available as a download from the Sage journals website [4]. If you can get access to it, it is well worth the read and the download. And all the articles referenced in this article are good examples of how we can use corpus linguistics to help guide our practice of developing the vocabulary of our clients with language challenges.

Davies, M. (2008-). The Corpus of Contemporary American English: 425 million words, 1990-present. Available from Brigham Young University The Corpus of Contemporary America English, from Brigham Young University http://corpus.byu.edu/coca

Gardner, D., & Davies, M. (2007). Pointing Out Frequent Phrasal Verbs: A Corpus-Based Analysis. TESOL Quarterly, 41(2), 339-359.

Garnier, M., & Schmitt, N. (2014). The PHaVE List: A pedagogical list of phrasal verbs and their most frequent meaning senses. Language Teaching Research, 1-22.Published online before print http://ltr.sagepub.com/content/early/2014/12/08/1362168814559798.abstract

Liu, D. (2011). The Most Frequently Used English Phrasal Verbs in American and British English: A Multicorpus Examination. TESOL Quarterly, 45(4), 661-688.

[1] A postilion is the driver of a horse-drawn carriage, who sits posterior to the horses. The sentence “The postilion has been struck by lightning” is the basis of a wonderful little paper by the linguist David Crystal, published in 1995 in the journal Child Language Teaching & Therapy. Simply titled “Postilion Sentences,” Crystal defines a postilion sentence as “one which has little or no chance of ever being useful in real life. It could be used, obviously, because it is grammatically well-formed; but the contexts in which it would be natural to use it are either so restricted or so adult that the chances of a child encountering it, or finding it necessary to use it, are remote.” In the design of AAC systems, using pre-stored sentences may have some limited value but many “pragmatic utterances” turn out to be nothing more than postilions; unlikely to be used. This is why teaching sentences is neither language nor therapy.

Download Postilion sentences article

Enter a caption

[2] The now-common practice of using core vocabulary also makes it much harder to prove plagiarism – or as we Lancastrians would say, “nicking someone else’s ideas.” People, of course, don’t “steal” ideas – they are “inspired” by the work of others. But such inspiration inevitably leads to systems appearing almost clone-like in their structure. It’s only when you get to the fine details of how words are organized and encoded that you can separate the wheat from the chaff. And there’s a lot of chaff out there.

[3] If I haven’t mentioned it before, Norbert is the author of an excellent book on vocabulary research methods. Here’s the full reference: Schmitt, N. (2010). Researching vocabulary : a vocabulary research manual. Houndmills, Basingstoke, Hampshire ; New York, NY: Palgrave Macmillan. It’s full of useful information and lots of web links worth exploring, and worth the $30 you’ll spend on Amazon US – or the £20.99 in the UK.

[4] Just a reminder to all members of the Royal College of Speech and Language Therapists that you membership benefits includes access to a number of Sage journals online, and Language Teaching Research is one of those. In fact, you have access to over 700 (yes, count ’em!) titles, including my personal favorites Child Language Teaching and Therapy, Clinical Linguistics & Phonetics, English Today, and the riveting Scandinavian Journal of Occupational Therapy. OK, so I lied about the last one being a “favorite” 🙂

Efficacy or Effectiveness? How To Be A Word Detective

Late last week I was in a meeting with a chappie from the International Organization for Standardization, talking about the role of the research group I belong to and explaining how we measure out performance. This sort of thing is typical of any company that needs to maintain its ISO status [1] and having lists of procedures, processes, and parametrics is de rigueur for the whole shebang.

In the course of the discussion, I happened to talk about the challenge of measuring the efficacy of a department whose purpose is to generate speculative ideas, 80% of which are likely to be unfeasible. The examiner stopped me and asked me to repeat the word, which I did, and my colleague also offered a “translation” by saying “effectiveness.” That did the trick and chalked it up to my being an Englishman who is still struggling to learn American. [2]

But being me, I jotted the words down in my ever-present notebook with a few to investigating whether the efficacy/effectiveness was, indeed, a transatlantic difference.

Of course, in this age of Evidence-Based Practice, the call for measures how much effect therapy has on a client means that it’s common to talk about the “efficacy of treatment” or the “effectiveness of an approach.” Or is it? Do we say “efficacy” or “effectiveness?” Is there, in fact, a difference?

Well, the first thing I often do with questions like this is to use the Google search engine and get a Ghit measure. “Ghit” is short for “Google Hit” and appears in a search as a number under the search bar. [3] Here’s what comes up for efficacy and effectiveness:

Efficacy: 17,100,000 ghits
Effectiveness: 179,000,000 ghits

Whoa! Quite a difference there, by a factor of ten. Just to corroborate the difference, I did a Bhit count and a Yhit count (Bing Hits and Yahoo Hits, if you weren’t sure).

Efficacy: 52,400,000 bhits and 52,600,00 Yhits
Effectiveness: 143,000,000 Bhits and 139,000,000 Yhits

So not ten times larger for effectiveness but still significantly more popular. But what about the notion that it’s a UK/US thing? After all, it is possible that the high ghit count is masking it – after all, the percentages will always skew in favor of the US when it comes to number of speakers.

This is when I turn to my trusty friend, the BYU-Corpus site, where we can play with the Corpus of Contemporary American to check on how a word is used in the US, and also the British National Corpus to get a UK perspective. I did this for my previous post on the use of have versus take in relation to bathing – and this turned out to be most definitely a US/UK distinction. Here’s what we see;

Oh bugger! It doesn’t look like a BrE versus AmE difference after all. There is a 10% variation between the two but I’m pretty sure it’s not statistically significant. My choice to use efficacy puts me in the minority in both the States and the Isles.

Desperate for some validation, I dug a little deeper by looking at some historical data. Maybe I’m just old and the incidence of the words has changed since I was a lad. The British National Corpus isn’t much help as it only covers the period from the 1980’s through to 1993, and I want to see older data than that.

The Oxford English Dictionary is a good source for historical information on word meaning, so I went to the bookshelf and did a little more research.

Efficacy as a noun dates from 1527 and is defined as the “(p)ower or capacity to produce effects.” It’s derived from the earlier Latin efficere meaning “to accomplish.” Its meaning hasn’t really changed since then and so we can call it a 16th century word – old enough.

Effectiveness as a noun is a little younger, with the OED identifying a first appearance in 1607, almost a hundred years after efficacy. It has a similar definition of, “(t)he quality of being effective.” Not surprisingly, it, too, can be traced back to the same Latin root as efficacy, efficere. However, it is a 17th century word so I can take some comfort (perhaps) in arguing that my use of efficacy is more “traditional.”

However, we can see something much more interesting if we take a peek at the Corpus of Historical American, which cover the period 1810-2009, and that certainly goes back further than my birth!

Here’s the chart of the behavior of the word efficacy since 1810:

The history of the word efficacy

efficacy 1810-2009

 Even before you click on the image to enlarge it, it’s clear that efficacy has been in a slow decline for decades. There’s been a modest upswing since the 1950’s but it’s nowhere near its glory days. So the inevitable question is, what has pushed it aside?

History of the word effectiveness

effectiveness 1810-2009

Well, well, well, what a surprise! The usurper turns out to have been no more than the Pretender to the Throne, effectiveness! From out of the shadows, the word has slowly increased its popularity to the point that it now hogs the limelight and commands center stage. Alas, poor efficacy, I knew it, Horatio.

The story might end there, with my claiming to be simply the sort of dude who uses older words, and who also is victim to the invisible hand of lexical change that can overturn the fortunes of synonyms. But there is something else: Although for most of the world, efficacy and effectiveness are synonymous (and dictionaries typically say that) there is a field in which they are not synonymous: the Clinical World.

Ah. but that’s a story for another day…

[1] For some time, I took pleasure in pointing out that the “International Organization for Standards” was clearly guilty of failing to notice that the acronym should be IOS and ISO. Alas, my mistake was to assume the ISO was an acronym, when in fact, it allegedly isn’t! The organization say that it’s derived from the Greek word isos, meaning “equal” and that they did this so they wouldn’t have to use different acronyms in different countries based on the languages. For example, in France it would be Organization Internationale de Normalization (OIN), so ISO is international.

[2] When folks ask me if I speak more than one language, I say I’m bilingual and can speak both English AND American. One of the delights of being an Englishman Abroad is that not only have I had the chance to be immersed in the UK’s melange of dialects and accents for the first 30-something years of life but now I get to go through it all over again with the different flavors and recipes of American English. I’m comfortable with Fall, happy to spell tyres as tires, and say “to-MAY-toe” and not “to-MAH-toe.”

[3] The accuracy of using ghits as a measure of word use is always open to question but as a quick and dirty metric it’s used by linguists who want to get a feel for how the world of words is playing out. Arnold Zwicky used them in a recent blog about the prefix “telephon-” and Geoff Pullum has them in a post on “Assholocracy,” so I think I’m in pretty good company.