Tag Archives: corpora

The Dudes Dissect “Closing the Gap” 2013: Day 2 – Of Speech and Sessions

Having looked at the vocabulary used in the Closing the Gap 2013 preconference sessions, it’s time to cast a lexical eye on the over 200 regular presentations that took place over two-and-a-half days. For most attendees, these are the “bread and butter” of the conference and choosing which to attend is a skill in of itself. It’s not uncommon [1] to have over ten sessions run concurrently, which means you’re only getting to attend a tenth of the conference!

So let’s take a look at the vocabulary used in the titles to all theses presentations to get a flavor of the topics on offer.

Conference Presentations: Titles

The total number of different words used in the session titles was 629 after adjusting for the top 50 words used in English [2]. As a minor deviation, kudos to all who used the word use correctly instead of the irritatingly misused utilize. Only one titled included utilizes – and it was used incorrectly; the rest got it right! For those who are unsure about use versus utilize, the simple rule is to use use and forget about utilize. The less simple rule is to remember that utilize means “to use something in a way in which it was never intended.” So, you use a pencil for drawing while you utilize it for removing wax from your ear; you use an iPad to run an application while you utilize it as a chopping board for vegetables; and you use a hammer to pound nails but utilize it to remove teeth. Diversion over.

Top 20 Most Frequent Words in Titles

Top 20 Most Frequent Words in Titles

Top 20 Most Frequent Words in Titles

No prizes for guessing that the hot topic is using iPad technology in AAC. Your best bet for a 10-word title for next year’s conference is;

How your students  use/access iPad AAC apps as assistive technology

This includes the top 10 of those top 20 words so your chances of getting accepted are high.

Conference Presentations: Content Words

The total word count for the session descriptions text is 2,532 different words (excluding the Stop List), which is a sizable number to play with. And when I say “different words,” I mean that I am basically counting any text string that is different from another as a “word.” So I count use, uses, used, and using as four words, and iPad and iPads as two. A more structured analysis would take such groups and count them as one “item” – or what we call a LEMMA. We’d then have a lemma of <USE> to represent all the different forms of use, which lets us treat use/used/uses/using as one “word” that changes its form depending on the environment in which it is sitting [3]

Top 50 Words By Frequency in Session Content

Top 50 Words By Frequency

A 2,3oo-word graphic would be rather large so I opted to illustrate the top 50 most frequently used words. As you can see, the top words seem to be the same as those in the titles, which suggests that on balance, presenters have done a good job overall in summarizing their presentation contents when creating their titles – something that is actually the strategy you should use.

Keywords in Content

Finally, let’s take a look at the keywords in the session content descriptions. Remember, the keywords are those that appear in a piece of text with a frequency much higher than you would expect in relation to the norm.

Top 10 words by Keyness score

Top 20 words by Keyness score

Top of our list here are apps with the iPad coming in at three. Fortunately this fetish for technology is tempered by the inclusion in our top 20 of words like strategies, learn, how, and skills, all critical parts of developing success in AAC that are extra to the machinery. It’s good to think that folks are remembering that how we teach the use of tools is far, far more important than obsessing over the tools themselves.

Coming next… The Dudes Dissect Closing the Gap: Day 3 – Of Content and Commerce. In which the Dudes look at the marketing blurbs of the Closing the Gap exhibitors to discover what the “hot button” words intended to make you want to buy!

[1] WordPress’s spell and grammar checker flagged the phrase “it’s not uncommon” as a double negative and told me that I should change it because, “Two negatives in a sentence cancel each other out. Sadly, this fact is not always obvious to your reader. Try rewriting your sentence to emphasize the positive.” Well, although I generally agree that you shouldn’t use no double negatives, the phrase “not uncommon” felt to me to be perfectly OK and not at all unusual. I therefore took a look at the Corpus of Contemporary American English and found that “it’s not uncommon” occurs 313 times while “it’s common” scores 392. This is as near to 50/50 as you get so I suggest to the nice people at WordPress that “it’s not uncommon” is actually quite common and thus quite acceptable – despite it being a technical double negative.

[2] For the curious among you, here are the contents of the Stop List I have been using, which is based on the top 50 most frequently used words in the British National Corpus (BNC): THE, OF, AND, TO, A, IN, THAT, IS, IT, FOR, WAS, ON, I, WITH, AS, BE, HE, YOU, AT, BY, ARE, THIS, HAVE, BUT, NOT, FROM, HAD, HIS, THEY, OR, WHICH, AN, SHE, WERE, HER, ONE, WE, THERE, ALL, BEEN, THEIR, IF, HAS, WILL, SO, NO, WOULD, WHAT, UP, CAN. This is pretty much the same as the top 50 for the Corpus of Contemporary American English, except that the latter includes the words about, do, and said instead of the BNC’s one, so, and their. Statistically, this isn’t significant so I suggest you don’t go losing any sleep over it.

[3] When you create and use lemmas, you also have to take into account that words can have multiple meanings and cross boundaries. In the example of use/used/uses/using, clearly we’re talking about a verb. But when we talk about a user and several users, we are now talking about nouns. So, we don’t have one lemma <USE> for use/used/user/users/uses/using but two lemmas <use(v)> and <use(n)> to mark this difference. It gets even more complicated when you have strings such as lights, which can be a verb in “He lights candles at Christmas” but a noun in “He turns on the lights when it’s dark.” When you do a corpus analysis of text strings, these sort of things are a bugger!

The Dudes Dissect “Closing the Gap” 2013: Day 1 – Of Words and Workshops

Regular readers of the Speech Dudes will know that when the “Dudes Do…” a conference, Day 1 is typically all about the travel experience, usually including some unfavorable comments about taxi cabs and hotel coffee, but this time I’m feeling charitable and, although not yet ready to “Hug a Cabbie,” I’ve decided to provide an overview of the preconference sessions, which I didn’t attend.
Now, you may think that not having attended a workshop might put me at a bit of a disadvantage with regard to reporting on content and offering a critique – and you would be right. On the other hand, what I can comment on is the contents of the preconference brochure that everyone can have access to prior to the actual event and which they use to decide the workshops and sessions they want to attend.

So what you’re going to see is an example of corpus linguistics in action, dissecting the very words used to influence YOUR choices. In short, you’re about to learn about what words presenters and marketers use to make up your mind for you. Grab your coffee, hold on to your hats, and prepare to be amazed at what you didn’t know!


The Dudes are big believers in the scientific method and the application of evidence-based practice. We strive for some objectivity where possible, although we acknowledge that our occasional rants may be just a tad subjective. We don’t expect our readers to take everything we say as gospel sharing the methodology of how we analyzed our data seems fair.

The raw data came straight from the official conference brochure, available for any to check at http://www.closingthegap.com/media/pdfs/conference_brochure.pdf. From that I extracted all the text in the following categories:

  • Preconference Workshop Titles
  • Preconference Workshop Course Descriptions
  • Conference Session Titles
  • Conference Session Descriptions
  • Exhibitor Descriptions

Technically, I simply did cut-and-paste from the PDF and then converted everything to TXT format because that’s the format preferred by the analysis software I use.

WordSmith 6 is a wonderful piece of software that lets you chop up large collections of text and make comparisons against other pieces of text. These comparisons can then show you interesting and fascinating details about how those words are being used. I’ve talked in more detail about WordSmith in our post, The Dudes Do ISAAC 2012 – Of Corpora and Concordances, so take a look at that if you want more details.

Once I have the TXT files, I can create a Word List that gives me frequency data, but I also use a Stop List to filter out common words. If you simply take any large sample of text and count how often words are used, you’ll find that the top 200 end up being the same – that’s what we call Core Vocabulary. And when you’re looking for “interesting” words, you really want to get rid of core because its… well… uninteresting! Hence a Stop List to “stop” those words appearing.[1]

Preconference Workshop Titles

The first opportunity you have to encourage folks to come to your session is to have a title that makes a reader want to find out more about what you have to offer. The title is, in fact, the door to your following content description. Of course, you have to find some balance between “catchy” and “accurate.” For example, a paper I presented at a RESNA (Rehabilitation and Engineering Society of America) conference entitled Semantic Compaction in the Dynamic Environment: Iconic Algebra as an Explanatory Model for the Underlying Process was, in all fairness, technically accurate, but from a marketing perspective it had all the appeal of a dog turd on crepe. [2]

Let’s therefore take a look at what seem to be the best words to use if you want to attract a crowd.

Pre-conference Sessions: Keyword in Titles

High frequency words in Pre-conference titles

High frequency words in preconference titles

The Word Cloud here counts only words that appeared twice or more, and the size of the words is directly proportional to frequency, so it’s clear that students is a critical word to use, followed closely by iPad, technology, learning, and communication. On that basis, if you’re planning to submit a paper for 2014, here’s your best “10-word-title” bet for getting (a) accepted and (b) a crowd:

The implementation of iPad technology for learning and communication

In the event that the CTG review committee find themselves looking at multiple courses submitted with the same title, you’re going to have to consider how you describe your actual course contents – and luckily, we can help there, too!

Preconference Sessions: Keywords in Course Content

The actual highest frequency words were workshop and participants, which is something of an artificial construct because most people include phrases such as “in this workshop, participants will…” and so I removed these from my keyword analysis.

Frequent words in preconference sessions content

Frequent words in preconference sessions content

So to further enhance the pulling power of your course, you need to be talking a lot about students, how they use iPads and communication, along with using apps to learn, enhance learning, and any strategies that help meet needs. In fact, you need to include any of these Top Ten words:

Top Ten keywords in Pre-conference session content

Top Ten keywords in preconference session content

But wait, wait… there’s more

I’ve been using the word keywords to refer to those words that appear within a piece of text more frequently than you would expect based on comparing them to a large normative sample. If you perform  a keyword analysis on the preconference contents sample, you find that the top five keywords that appear are iPad, iPads, AAC, apps, and students. This suggests that we do an awful lot of talking about one, very specific brand name device – which is good news for the marketing department at Apple!

Top 15 words by Keyness score

Top 15 words by Keyness score

The relevant score is the keyness value. The higher the keyness, the more “key” the word is i.e. its frequency in the sample is significantly higher than you would expect to see in the normal population. So when you look at the table above, you’re not just seeing frequency scores but how significantly important words are. [3] As an example, the word iPads is used less frequently than the word communication (10 times as against score 16) but iPads is almost twice as “key” as communication i.e. it is significantly more important.

Now, as a final thought for folks who are working in the field of AAC (augmentative and alternative communication), I suggest that if you are developing vocabulary sets for client groups, using frequency studies is certainly a good start (and more scientific than the tragically common practice of picking the words “someone” thinks are needed) but if you then introduce a keyness analysis, you can improve the effectiveness of your vocabulary selection.

Coming next… The Dudes Dissect Closing The Gap 2013: Day 2 – Of Speech and Session. In which the Dudes present an analysis of the words used to describe conference session titles and contents. Find out how to improve your chances of getting paper presented!

[1] In truth, there is more I could say about the methodology, and were this intended to be a peer-reviewed article for a prestigious journal, rest assured I’d go into much more detail about some of the finer points. However, this is simply a blog post designed to educate and entertain, so I ask you to allow me some leeway with regard to precision. I’m happy to share the raw data with folks who want to see it but all I ask is you don’t toss it around willy-nilly.

[2] Not only did it have a title that included the word “algebra” but it was scheduled for 8:00 am on the final day (a Saturday, no less) of the conference. Surprisingly, people showed up – which says more about the sort of folks who attend RESNA conferences rather than anything about my “pulling power” as a presenter.

[3] There is a mathematical formula for the calculation of keyness values. One way is to use the Chi-Square statistic; the other is to use a Log-likelihood score, which is something like a Chi-Square on steroids. As I’ve often said, I didn’t become an SLP because of my ability to handle math and statistics, so I admit to finding these things a strain on my brain. However, for the non-statistically inclined among us, the point is that both these measures simply compare the frequency value of a word from an experimental sample against the frequency value it has in a very large comparative sample (such as the British National Corpus or the Corpus of Contemporary American), and then shows you how similar or dissimilar they are. If their frequencies are very, very dissimilar, the word from the experimental sample is a keyword – like iPad and AAC in the examples above. Now feel free to pour yourself a drink and let your brain relax.

The Dudes Do ASHA 2012: Day 4 11/17/12

It was the last day of ASHA and I had the special honor of closing the AAC strand for the convention. In short, I was last on the list of AAC presenters. In a curious twist of fate, my colleague from Germany opened the AAC strand at the first session of Thursday so between us we’d bracketed the field!

ASHA at Georgia World Conference Center


A less charitable viewpoint might be that I had to present after lunch on the last day, when many folks were leaving to catch planes home or taking the opportunity to spend one last day in the wonderful city of Atlanta. So the fact that folks turned up, including one of my #slpeeps from the Twitterverse was quite a relief. [2]

The topic was on how to use the data generated by an AAC device to plan therapy sessions. A number of AAC technologies have the facility to track data but few people seem to use it. The purpose of the presentation was to show folks that there is immense value in using such logging in order to help clients improve their communication skills.

Basically, automated data logging tracks events over time; you can see what someone is saying and when they are saying it. And with just these two pieces of information, you can provide a much better service to your clients. [1] You can gather information about;

  • Vocabulary – the words your client uses
  • Morphology – the way your client uses morphemes to indicate tense, number, intensity etc.
  • Syntax – how your client uses words in a systemic way along with other words
  • Function – how is your client’s language used (questions, imperatives, requests etc.)

To facilitate this, you can use the QUAD Profile, a paper-based checklist that provides guidelines on what to look for. Developed in 2005 as a quick and dirty evaluation tool, the QUAD is simple enough that you don’t have to be a specialist in AAC to use it [3]. You can click on the graphic below to download a copy.

Download the QUAD Profile

QUAD Profile

You can also take user-generated text data and analyze it using either Concordance or WordSmith, two pieces of software that you can input large amounts of text and then measure word frequencies, type/token ratios, or find keywords – those words in a sample that occur more frequently than you would expect by chance. I’ve covered both these – and discussed core versus fringe versus keywords in The Dudes Do ISAAC 2012: Day 4 – Of Corpora and Concordances, so take a look there for more details.

What I failed to spend any time talking about was the excellent BYU Corpora created by Mark Davies at Brigham Young University. If you’re wanting to find out how a particular word is used in contemporary American English – or slightly less contemporary British English – you can do no worse than using these corpora than the Corpus of Contemporary American English, or COCA [4]. As an example, I previously talked about the difference between “taking a bath/shower” and “having a bath/shower,” arguing that in British English you’d teach “having” whereas in American English you’d focus on “taking.” The key point is that you can use the COCA to quantify this difference. And quantifying is a step towards evidence-based practice.

Here’s another example of where using the COCA can help you decide on which words to teach: which should you teach first – look or see? Well, if you want to focus on bigger semantic bang-for-buck, you should go for see, which is used in speech twice as often as look. Or how about need and want? It turns out that want is three times more likely to be used than need, so want is much more useful.

Another thing the COCA does is to show how words are used in context. This turns out to be very valuable knowledge to have when teaching language because you can’t just teach a word in isolation. For example, if we go back to the example of the word look, the COCA shows that is very frequently appears immediately before a preposition. Here are specifics:

the word look with prepositions

“look” and PREP

So if you are going to teach look, think about look at followed by look for as contextual phrases because that’s how the word is used in real life! Here’s a link to download my slides and notes as a PDF handout.

DOWNLOAD: Using AAC device-generated client data to develop therapy sessions

By 2:30, I was done. My target was to be in my room at 3:00 with my shoes off, feet up, and a coffee in my hand. And this turned out to be a success!

At 5:00, I left for an early dinner with friend at the Sweet Georgia’s Juke Joint at 200 Peachtree Street. Being in the South, I plumped for fried chicken with collard greens, a peach cobbler for dessert, and a delicious Millionaires Mojito. To make the night breeze along, we were entertained by Nat George and the Nat George Players, a band so smooth you could spread ’em on toast.

The video doesn’t really do the band justice but that’s all the more reason for you to put a trip to see them on your list of “Things to do in Atlanta” on you next trip out.

Another memorable night, and yet another example of why I guess I could spend much more time exploring the city. But tomorrow it’s back home. Ah well. C’est la vie.

Downtown Atlanta

Downtown Atlanta

[1] At this point, you might wonder why I don’t leap into the discussion about privacy, security, and ethics. Well, that’s because if I need to do that, I’d rather spend an entire post on it. But the short answer is that in all the years I’ve worked with clients who have data logging capabilities I have yet to have ONE tell me that I can’t see their data. After I have a short conversation about why I want to track their data and what I intend to do with it, they’ve been happy to allow me to have access. It’s important to have this discussion prior to turning on monitoring, and critical to explain the value, but once you do that, there’s no problem. Informed consent is a wonderful thing.

[2] OK, so it was @MeganPanatier – Thanks for stopping by and for tweeting some of my comments during the presentation!

[3] Cross, R.T. (2010). Developing Evidence-Based Clinical Resources, in Embedding Evidence-Based Practice in Speech and Language Therapy: International Examples (eds H. Roddam and J. Skeat), John Wiley & Sons, Ltd., Chichester, UK.

[4] The site  includes the Corpus of Contemporary American English (450 million words), the British National Corpus (100 million words), the Corpus of Historical American (400 million words), the Time Magazine corpus (100 million words) and the new Corpus of American Soap Operas (100 million words), which I have yet to test run!

The Dudes Do ISAAC 2012: Day 4 – Of Corpora and Concordances

Pittsburgh from Station Square

Pittsburgh from Station Square

Marketing applies as much to conference presentations as it does to selling beans. Or coffee. Or bagels[1]. Picking a good title is more important than the presentation itself. Really, it is. Which explains why my “first-thing-in-the-morning” session was not exactly standing room only. The presentation title was the technically accurate but marketingly disasterous Using Concordance Software and online Corpora in AAC. A much better title would have been Using Velcro and a Free iPad for New Simple Gamechanging Therapy. You see, this has all the buzz words that people scan for when reading a conference program. Free is always a winner; iPad is currently sexy; new suggests you will be surprised and maybe first to do something in your part of the world; Velcro® is something ALL therapists relate to; and game-changer is an over-used, over-hyped, almost meaningless vogue word that can be applied to anything in order to make it sound impressive. People who use the word “game-changer” should be hung, drawn, quartered, and made to read a thesaurus.

But I did, fortunately, hear from a number of folks who told me they wanted to come to the presentation but it clashed with another. It clashed, in fact, with several! So given a finite number of potential attendees divided by the number of sessions, as concurrency goes up, individual session attendence goes down. Therefore for those who were unable to attend, I can at least give you a brief summary of what I was talking about. And for those folks who couldn’t make it to ISAAC 2012 in the first place, I’m also including this link to my PowerPoint files and Resources List via the Dudes’ Dropbox account.

The first thing I covered was the difference between core, fringe, and keyword vocabulary. In AAC, the use of core and fringe is now fairly common but we need to make another distinction for something called keyword vocabulary. Here’s how these three words can be defined:

Core word: A word that has a high frequency of use value that is statistically expected when compared to a large reference corpus.
Fringe word: A word that has a low frequency of use value that is statistically expected when compared to a large reference corpus.
Keyword: A word that has a higher frequency of use which is significantly more frequent than expected when compared to a large reference corpus.

Notice that these definitions do not include any notion of “importance.” A common mistake is for people to say things like, “but Tommy loves Transformers so ‘Optimus Prime’ is an important core word for him.” No, “Optimus Prime” is a keyword for him. It may seem like a trivial distinction but it is useful.  Sure, it may be an “important” word for him” but that still does not make it core. Thus, when people talk about a “personal core” for an individual, they are really talking about a person’s keyword set. It is much better to use this because talking about a “personal core” seems to me to be confusing and changes the definition of core.

The notion of keywords has been taken straight from the field of Corpus Linguistics:

Keywords are words which are significantly more frequent in a sample of text than would be expected, given their frequency in a large general reference corpus. (Stubbs, 2010) [2]

Corpus Linguistics uses large data samples, or corpora, to look for patterns in language. The larger the samples are, the more reflective the data is of “real world” language use. One of the largest online sets of such corpora is those developed and maintained by Mark Davies at Brigham Young University in Utah. The Corpus of Contemporary American English [3] is based on a sample of 425 million words, and can provide frequency data of individual items, as well as contextual information on how these are typically used. This type of data can be useful for the AAC practitioner in determiner which words to include in a system and to answer questions about how a word may be used (e.g. is the work light used more as a noun than a verb?)

Another tool used by corpus linguists is concordance software. Such software allows investigators to input text and create output in the form of frequency lists, key word lists, and key words in context. The AAC practitioner can use client-generated data and run it through concordance software to build personal vocabulary lists. It’s also possible to compare a client’s data with other samples, which can also be very instructive for a clinician who wants to see how an individual’s use of language matches with a “standard.”

Concordance software


Concordance is a flexible text analysis program which lets you gain better insight into e-texts and analyze language objectively and in depth. It lets you count words, make word lists, word frequency lists, and indexes.

You can select and sort words in many ways, search for phrases, do proximity searches, sample words, and do regular expression searches. You can also see statistics on your text, including word types, tokens, and percentages, type/token ratios, character and sentence counts and a word length chart.

Wordsmith concordance softwareWordSmith is a popular word-analysis software that includes features to generate word lists, frequency lists, usage lists, and keyword lists.

It also has the option to download the British National Corpus word frequency list to use as a large comparative data set. This is a great tool for investigating keywords among small data sets.

Now, a number of commercial devices have this data-logging feature included as an option, providing a record of events over time. With the client’s consent, being able to track such usage can be invaluable in helping clinicians and educators see exactly what the client is currently capable of doing and, by extension, create teaching plans that will develop their ability to use the device. But if you are prepared to clean the raw data from an AAC device up a little, you can drop it into a concordance software and works some magic. You can see how a client’s use of vocabulary matches what you might expect; you can discover a clients keyword vocabulary by filtering out core words; and you can look at how client’s use vocabulary in context e.g. where do they use the word light and how is it being used.

In summary, what I’m suggesting is that using (a) large online corpora and (b) concordance software can enhance the way on which we develop and expand AAC systems, and that both of these are based on actual usage of language and not some hypothetical construct of what we think is happening with vocabulary.

Enough of the academic stuff; I just want to alert you to an unmissable experience at Tonic Bar & Grill on the corner at 971 Liberty Avenue, just outside the David L. Lawrence Convention Center. Those of a nervous gastronomic disposition may want to stop reading now – as may folks who are on any diet other than the “Let’s See How Fat I Can Get Before My Arteries Explode” diet.

At any time of day, you should prop yourself up at the bar, order one of their small selection of draught beers, and place an order for Poutine Fries [4]. This is a heavenly bowl of hot potato fries, smoothered in slippery, creamy cheese, and topped with a generous helping of tender braised short ribs. You can choose to experience this ambrosial feast either by eating it or having a cardiologist smear it directly on to your arteries: we recommend the former. How we managed to eat just one bowl is still a mystery to us but our hearts will undoubtedly thank us.

Poutine fries

Poutine Fries

[1] As it was an early presentation, I skipped breakfast, which meant that by the time I’d finished I was hungry. So a shout out to the good folks at Bruegger’s Bagels on Grant Street in downtown Pittsburgh who supplied me with their Breakfast Bagel, a mouth-watering treat of egg, cheese, and bacon on a crusty whole-wheat bagel. I’m pretty sure it’s not the healthiest of starts to the day but it sure is one of the tastiest.

[2] Stubbs, M. (2010). Three concepts of keywords. In M. Bondi and M. Scott (Eds.) Keyness in Texts: Studies in Corpus Linguistics. John Benjamins Publishing: Philadelphia.

[3] Davies, M. (2008-) The Corpus of Contemporary American English: 425 million words, 1990-present. Available online at http://corpus.byu.edu/coca/.

[4] As our Canadian friend will know, Poutine Fries originated in Quebec and therefore represent a form of biological warfare against America, the intent being to bring the country to its knees by making everyone too fat to get up off of them. Rest assured that on their next trip to Montreal, the Dudes will make sure they take advantage of sampling the local Poutine Fries and would encourage anyone taking a trip to Canada to do the same!

Efficacy or Effectiveness? How To Be A Word Detective

Late last week I was in a meeting with a chappie from the International Organization for Standardization, talking about the role of the research group I belong to and explaining how we measure out performance. This sort of thing is typical of any company that needs to maintain its ISO status [1] and having lists of procedures, processes, and parametrics is de rigueur for the whole shebang.

In the course of the discussion, I happened to talk about the challenge of measuring the efficacy of a department whose purpose is to generate speculative ideas, 80% of which are likely to be unfeasible. The examiner stopped me and asked me to repeat the word, which I did, and my colleague also offered a “translation” by saying “effectiveness.” That did the trick and chalked it up to my being an Englishman who is still struggling to learn American. [2]

But being me, I jotted the words down in my ever-present notebook with a few to investigating whether the efficacy/effectiveness was, indeed, a transatlantic difference.

Of course, in this age of Evidence-Based Practice, the call for measures how much effect therapy has on a client means that it’s common to talk about the “efficacy of treatment” or the “effectiveness of an approach.” Or is it? Do we say “efficacy” or “effectiveness?” Is there, in fact, a difference?

Well, the first thing I often do with questions like this is to use the Google search engine and get a Ghit measure. “Ghit” is short for “Google Hit” and appears in a search as a number under the search bar. [3] Here’s what comes up for efficacy and effectiveness:

Efficacy: 17,100,000 ghits
Effectiveness: 179,000,000 ghits

Whoa! Quite a difference there, by a factor of ten. Just to corroborate the difference, I did a Bhit count and a Yhit count (Bing Hits and Yahoo Hits, if you weren’t sure).

Efficacy: 52,400,000 bhits and 52,600,00 Yhits
Effectiveness: 143,000,000 Bhits and 139,000,000 Yhits

So not ten times larger for effectiveness but still significantly more popular. But what about the notion that it’s a UK/US thing? After all, it is possible that the high ghit count is masking it – after all, the percentages will always skew in favor of the US when it comes to number of speakers.

This is when I turn to my trusty friend, the BYU-Corpus site, where we can play with the Corpus of Contemporary American to check on how a word is used in the US, and also the British National Corpus to get a UK perspective. I did this for my previous post on the use of have versus take in relation to bathing – and this turned out to be most definitely a US/UK distinction. Here’s what we see;

Oh bugger! It doesn’t look like a BrE versus AmE difference after all. There is a 10% variation between the two but I’m pretty sure it’s not statistically significant. My choice to use efficacy puts me in the minority in both the States and the Isles.

Desperate for some validation, I dug a little deeper by looking at some historical data. Maybe I’m just old and the incidence of the words has changed since I was a lad. The British National Corpus isn’t much help as it only covers the period from the 1980’s through to 1993, and I want to see older data than that.

The Oxford English Dictionary is a good source for historical information on word meaning, so I went to the bookshelf and did a little more research.

Efficacy as a noun dates from 1527 and is defined as the “(p)ower or capacity to produce effects.” It’s derived from the earlier Latin efficere meaning “to accomplish.” Its meaning hasn’t really changed since then and so we can call it a 16th century word – old enough.

Effectiveness as a noun is a little younger, with the OED identifying a first appearance in 1607, almost a hundred years after efficacy. It has a similar definition of, “(t)he quality of being effective.” Not surprisingly, it, too, can be traced back to the same Latin root as efficacy, efficere. However, it is a 17th century word so I can take some comfort (perhaps) in arguing that my use of efficacy is more “traditional.”

However, we can see something much more interesting if we take a peek at the Corpus of Historical American, which cover the period 1810-2009, and that certainly goes back further than my birth!

Here’s the chart of the behavior of the word efficacy since 1810:

The history of the word efficacy

efficacy 1810-2009

 Even before you click on the image to enlarge it, it’s clear that efficacy has been in a slow decline for decades. There’s been a modest upswing since the 1950’s but it’s nowhere near its glory days. So the inevitable question is, what has pushed it aside?

History of the word effectiveness

effectiveness 1810-2009

Well, well, well, what a surprise! The usurper turns out to have been no more than the Pretender to the Throne, effectiveness! From out of the shadows, the word has slowly increased its popularity to the point that it now hogs the limelight and commands center stage. Alas, poor efficacy, I knew it, Horatio.

The story might end there, with my claiming to be simply the sort of dude who uses older words, and who also is victim to the invisible hand of lexical change that can overturn the fortunes of synonyms. But there is something else: Although for most of the world, efficacy and effectiveness are synonymous (and dictionaries typically say that) there is a field in which they are not synonymous: the Clinical World.

Ah. but that’s a story for another day…

[1] For some time, I took pleasure in pointing out that the “International Organization for Standards” was clearly guilty of failing to notice that the acronym should be IOS and ISO. Alas, my mistake was to assume the ISO was an acronym, when in fact, it allegedly isn’t! The organization say that it’s derived from the Greek word isos, meaning “equal” and that they did this so they wouldn’t have to use different acronyms in different countries based on the languages. For example, in France it would be Organization Internationale de Normalization (OIN), so ISO is international.

[2] When folks ask me if I speak more than one language, I say I’m bilingual and can speak both English AND American. One of the delights of being an Englishman Abroad is that not only have I had the chance to be immersed in the UK’s melange of dialects and accents for the first 30-something years of life but now I get to go through it all over again with the different flavors and recipes of American English. I’m comfortable with Fall, happy to spell tyres as tires, and say “to-MAY-toe” and not “to-MAH-toe.”

[3] The accuracy of using ghits as a measure of word use is always open to question but as a quick and dirty metric it’s used by linguists who want to get a feel for how the world of words is playing out. Arnold Zwicky used them in a recent blog about the prefix “telephon-” and Geoff Pullum has them in a post on “Assholocracy,” so I think I’m in pretty good company.

Baths and Showers: “Taking” or “Having”?

In the 3rd century BCE, the philosopher Archimedes was taking a long bath and playing with his rubber duck. To be honest, it may not have been a rubber duck but he was dunking something in and out of the water because according to legend, he leapt out, ran down the street, and shouted “Eureka!” which is Greek for “I’ve found it!” [1] What he’d found was a method of finding out how to decide if a gold crown was actually made of gold without melting it down, which you can do by dropping it in water and measuring the amount of liquid that gets displaced. This became known as Archimedes’ Principle but sadly he neglected to trademark the phrase or sell the slogan on togas so he failed to make a fortune from this well-known piece of intellectual property.

Archimedes shouts Eureka


Having ideas in the bath is something with which most people are familiar. There’s clearly something about being submerged in warm water that gets the brain a-buzzing, doubtless supported by a slew of research studies that talk about expanded arteries, endorphins, and brain scans.

So this morning while I was in the shower, I got to thinking about how I actually talked about the process of showering i.e. did I say “I’m going to take a shower” or “I’m going to have a shower.” Now before you read any further, think about which of those two sentences sounds “right” to you.

If you’re American, I’m predicting you use “take” whereas if you’re British, I’m going to say you said “have.” If you’re Canadian, Australian, or a New Zealander, I’d be happy to hear from you because I’m less sure – but if I had to take a guess, I think you’re a “haver” not a “taker.”

The reason I can be so confident is that I checked out the incidence of the use of the verbs have and take in relation to bathing and showering using the British National Corpus (BNC) and the Corpus of Contemporary American (COCA). I’ve mentioned these corpora before and I encourage you again to think about using them to help make decisions about real world language usage. [2]

All I did was to search for the phrases “take a bath/shower” and “have a bath/shower” in each corpus and use a simple percentage score to create the following table:

have versus take as verb with bath

"have" versus "take"

Feel free to perform a Chi-square analysis on this if you want but the figures look significant enough without whipping out the calculator. Notice that the have/take skew is much more pronounced for American English than British English but even the latter is pretty big.

Because I work primarily in AAC, I use this sort of information about language use in the real world for developing systems. And such data also critical for teaching communication strategies. It’s not enough to simply aim to teach words as individual items because words exist within the context of other words, and those relationships are critical to understanding. For example, given the data I’ve just demonstrated, teaching the word bath along with take would make perfect sense if I’m working in the US but back in the UK, I’d be better served focusing on using have with bath.

Knowledge of word collocation can be tremendously useful when creating intervention plans, and tools such as the COCA and BNC do this. Staying with the word bath, I did a collocation search for the words that appear immediately before and after it. The words hot and bubble are the top two that go before bath, with water appearing both before and after in almost equal amounts. With this sort of collocation information, I can be confident in teaching the words hot, bubble, and water along with bath, which not only adds new words to my client’s lexicon but also provides real contextual information about how the word bath is used.

 For more about the COCA and BNC corpora – and others – go to Mark Davies’ corpus.byu.edu site and explore the interface. It’s a wonderful resource and much underused by speech pathologists methinks.

[1] The Greek word εὑρίσκω means “I find” and εὑρηκα is the perfect form meaning “I have found.” Greek declensions aside, Archimedes was clearly pretty excited about something.

[2] I’m aware that the COCA and BNC differ in relation to when they were created; the BNC data is from 1980-1993 whereas the COCA is more current with data from 1990-2011. However, given that this is a known variable, it’s still reasonable to make comparisons.