The Dudes Do ISAAC 2012: Day 4 – Of Corpora and Concordances

Pittsburgh from Station Square

Pittsburgh from Station Square

Marketing applies as much to conference presentations as it does to selling beans. Or coffee. Or bagels[1]. Picking a good title is more important than the presentation itself. Really, it is. Which explains why my “first-thing-in-the-morning” session was not exactly standing room only. The presentation title was the technically accurate but marketingly disasterous Using Concordance Software and online Corpora in AAC. A much better title would have been Using Velcro and a Free iPad for New Simple Gamechanging Therapy. You see, this has all the buzz words that people scan for when reading a conference program. Free is always a winner; iPad is currently sexy; new suggests you will be surprised and maybe first to do something in your part of the world; Velcro® is something ALL therapists relate to; and game-changer is an over-used, over-hyped, almost meaningless vogue word that can be applied to anything in order to make it sound impressive. People who use the word “game-changer” should be hung, drawn, quartered, and made to read a thesaurus.

But I did, fortunately, hear from a number of folks who told me they wanted to come to the presentation but it clashed with another. It clashed, in fact, with several! So given a finite number of potential attendees divided by the number of sessions, as concurrency goes up, individual session attendence goes down. Therefore for those who were unable to attend, I can at least give you a brief summary of what I was talking about. And for those folks who couldn’t make it to ISAAC 2012 in the first place, I’m also including this link to my PowerPoint files and Resources List via the Dudes’ Dropbox account.

The first thing I covered was the difference between core, fringe, and keyword vocabulary. In AAC, the use of core and fringe is now fairly common but we need to make another distinction for something called keyword vocabulary. Here’s how these three words can be defined:

Core word: A word that has a high frequency of use value that is statistically expected when compared to a large reference corpus.
Fringe word: A word that has a low frequency of use value that is statistically expected when compared to a large reference corpus.
Keyword: A word that has a higher frequency of use which is significantly more frequent than expected when compared to a large reference corpus.

Notice that these definitions do not include any notion of “importance.” A common mistake is for people to say things like, “but Tommy loves Transformers so ‘Optimus Prime’ is an important core word for him.” No, “Optimus Prime” is a keyword for him. It may seem like a trivial distinction but it is useful.  Sure, it may be an “important” word for him” but that still does not make it core. Thus, when people talk about a “personal core” for an individual, they are really talking about a person’s keyword set. It is much better to use this because talking about a “personal core” seems to me to be confusing and changes the definition of core.

The notion of keywords has been taken straight from the field of Corpus Linguistics:

Keywords are words which are significantly more frequent in a sample of text than would be expected, given their frequency in a large general reference corpus. (Stubbs, 2010) [2]

Corpus Linguistics uses large data samples, or corpora, to look for patterns in language. The larger the samples are, the more reflective the data is of “real world” language use. One of the largest online sets of such corpora is those developed and maintained by Mark Davies at Brigham Young University in Utah. The Corpus of Contemporary American English [3] is based on a sample of 425 million words, and can provide frequency data of individual items, as well as contextual information on how these are typically used. This type of data can be useful for the AAC practitioner in determiner which words to include in a system and to answer questions about how a word may be used (e.g. is the work light used more as a noun than a verb?)

Another tool used by corpus linguists is concordance software. Such software allows investigators to input text and create output in the form of frequency lists, key word lists, and key words in context. The AAC practitioner can use client-generated data and run it through concordance software to build personal vocabulary lists. It’s also possible to compare a client’s data with other samples, which can also be very instructive for a clinician who wants to see how an individual’s use of language matches with a “standard.”

Concordance software

Concordance

Concordance is a flexible text analysis program which lets you gain better insight into e-texts and analyze language objectively and in depth. It lets you count words, make word lists, word frequency lists, and indexes.

You can select and sort words in many ways, search for phrases, do proximity searches, sample words, and do regular expression searches. You can also see statistics on your text, including word types, tokens, and percentages, type/token ratios, character and sentence counts and a word length chart.

Wordsmith concordance softwareWordSmith is a popular word-analysis software that includes features to generate word lists, frequency lists, usage lists, and keyword lists.

It also has the option to download the British National Corpus word frequency list to use as a large comparative data set. This is a great tool for investigating keywords among small data sets.

Now, a number of commercial devices have this data-logging feature included as an option, providing a record of events over time. With the client’s consent, being able to track such usage can be invaluable in helping clinicians and educators see exactly what the client is currently capable of doing and, by extension, create teaching plans that will develop their ability to use the device. But if you are prepared to clean the raw data from an AAC device up a little, you can drop it into a concordance software and works some magic. You can see how a client’s use of vocabulary matches what you might expect; you can discover a clients keyword vocabulary by filtering out core words; and you can look at how client’s use vocabulary in context e.g. where do they use the word light and how is it being used.

In summary, what I’m suggesting is that using (a) large online corpora and (b) concordance software can enhance the way on which we develop and expand AAC systems, and that both of these are based on actual usage of language and not some hypothetical construct of what we think is happening with vocabulary.

Enough of the academic stuff; I just want to alert you to an unmissable experience at Tonic Bar & Grill on the corner at 971 Liberty Avenue, just outside the David L. Lawrence Convention Center. Those of a nervous gastronomic disposition may want to stop reading now – as may folks who are on any diet other than the “Let’s See How Fat I Can Get Before My Arteries Explode” diet.

At any time of day, you should prop yourself up at the bar, order one of their small selection of draught beers, and place an order for Poutine Fries [4]. This is a heavenly bowl of hot potato fries, smoothered in slippery, creamy cheese, and topped with a generous helping of tender braised short ribs. You can choose to experience this ambrosial feast either by eating it or having a cardiologist smear it directly on to your arteries: we recommend the former. How we managed to eat just one bowl is still a mystery to us but our hearts will undoubtedly thank us.

Poutine fries

Poutine Fries

Notes
[1] As it was an early presentation, I skipped breakfast, which meant that by the time I’d finished I was hungry. So a shout out to the good folks at Bruegger’s Bagels on Grant Street in downtown Pittsburgh who supplied me with their Breakfast Bagel, a mouth-watering treat of egg, cheese, and bacon on a crusty whole-wheat bagel. I’m pretty sure it’s not the healthiest of starts to the day but it sure is one of the tastiest.

[2] Stubbs, M. (2010). Three concepts of keywords. In M. Bondi and M. Scott (Eds.) Keyness in Texts: Studies in Corpus Linguistics. John Benjamins Publishing: Philadelphia.

[3] Davies, M. (2008-) The Corpus of Contemporary American English: 425 million words, 1990-present. Available online at http://corpus.byu.edu/coca/.

[4] As our Canadian friend will know, Poutine Fries originated in Quebec and therefore represent a form of biological warfare against America, the intent being to bring the country to its knees by making everyone too fat to get up off of them. Rest assured that on their next trip to Montreal, the Dudes will make sure they take advantage of sampling the local Poutine Fries and would encourage anyone taking a trip to Canada to do the same!

6 responses to “The Dudes Do ISAAC 2012: Day 4 – Of Corpora and Concordances

  1. Is this a new take on poutine? The picture looks absolutely delicious, but I was under the impression that poutine implied cheese curds and gravy, not cheddar and ribs (as amazing a combination as that may be).

    • It may well be. Poutine is new to me and all I know is that it originated in Quebec – allegedly. Like most dishes, there are likely to be many variations on a theme, and many people claiming to know the contents of the “original.” But all I can say is that fries, cheese, and any meaty substance is going to work for me 😉 Frankly, I’d be up for testing as many variations as folks can give me.

  2. Pingback: The Dudes Do ASHA 2012: Day 4 11/17/12 | The Speech Dudes

  3. Pingback: The Dudes Dissect “Closing the Gap” 2013: Day 1 – Of Words and Workshops | The Speech Dudes

  4. Pingback: Peppa Pig: Go Ahead and Let Your Kids Watch! | The Speech Dudes

  5. Pingback: Reflections on Core/PECS: It doesn’t have to be an either/or proposition

Leave a comment