The Dudes Do ISAAC 2012: Day 4 – Of Corpora and Concordances

Pittsburgh from Station Square

Marketing applies as much to conference presentations as it does to selling beans. Or coffee. Or bagels[1]. Picking a good title is more important than the presentation itself. Really, it is. Which explains why my “first-thing-in-the-morning” session was not exactly standing room only. The presentation title was the technically accurate but marketingly disasterous Using Concordance Software and online Corpora in AAC. A much better title would have been Using Velcro and a Free iPad for New Simple Gamechanging Therapy. You see, this has all the buzz words that people scan for when reading a conference program. Free is always a winner; iPad is currently sexy; new suggests you will be surprised and maybe first to do something in your part of the world; Velcro® is something ALL therapists relate to; and game-changer is an over-used, over-hyped, almost meaningless vogue word that can be applied to anything in order to make it sound impressive. People who use the word “game-changer” should be hung, drawn, quartered, and made to read a thesaurus.

But I did, fortunately, hear from a number of folks who told me they wanted to come to the presentation but it clashed with another. It clashed, in fact, with several! So given a finite number of potential attendees divided by the number of sessions, as concurrency goes up, individual session attendence goes down. Therefore for those who were unable to attend, I can at least give you a brief summary of what I was talking about. And for those folks who couldn’t make it to ISAAC 2012 in the first place, I’m also including this link to my PowerPoint files and Resources List via the Dudes’ Dropbox account.

The first thing I covered was the difference between core, fringe, and keyword vocabulary. In AAC, the use of core and fringe is now fairly common but we need to make another distinction for something called keyword vocabulary. Here’s how these three words can be defined:

Core word: A word that has a high frequency of use value that is statistically expected when compared to a large reference corpus.
Fringe word: A word that has a low frequency of use value that is statistically expected when compared to a large reference corpus.
Keyword: A word that has a higher frequency of use which is significantly more frequent than expected when compared to a large reference corpus.

Notice that these definitions do not include any notion of “importance.” A common mistake is for people to say things like, “but Tommy loves Transformers so ‘Optimus Prime’ is an important core word for him.” No, “Optimus Prime” is a keyword for him. It may seem like a trivial distinction but it is useful.  Sure, it may be an “important” word for him” but that still does not make it core. Thus, when people talk about a “personal core” for an individual, they are really talking about a person’s keyword set. It is much better to use this because talking about a “personal core” seems to me to be confusing and changes the definition of core.

The notion of keywords has been taken straight from the field of Corpus Linguistics:

Keywords are words which are significantly more frequent in a sample of text than would be expected, given their frequency in a large general reference corpus. (Stubbs, 2010) [2]

Corpus Linguistics uses large data samples, or corpora, to look for patterns in language. The larger the samples are, the more reflective the data is of “real world” language use. One of the largest online sets of such corpora is those developed and maintained by Mark Davies at Brigham Young University in Utah. The Corpus of Contemporary American English [3] is based on a sample of 425 million words, and can provide frequency data of individual items, as well as contextual information on how these are typically used. This type of data can be useful for the AAC practitioner in determiner which words to include in a system and to answer questions about how a word may be used (e.g. is the work light used more as a noun than a verb?)

Another tool used by corpus linguists is concordance software. Such software allows investigators to input text and create output in the form of frequency lists, key word lists, and key words in context. The AAC practitioner can use client-generated data and run it through concordance software to build personal vocabulary lists. It’s also possible to compare a client’s data with other samples, which can also be very instructive for a clinician who wants to see how an individual’s use of language matches with a “standard.”

Concordance software


Concordance is a flexible text analysis program which lets you gain better insight into e-texts and analyze language objectively and in depth. It lets you count words, make word lists, word frequency lists, and indexes.

You can select and sort words in many ways, search for phrases, do proximity searches, sample words, and do regular expression searches. You can also see statistics on your text, including word types, tokens, and percentages, type/token ratios, character and sentence counts and a word length chart.

Wordsmith concordance softwareWordSmith is a popular word-analysis software that includes features to generate word lists, frequency lists, usage lists, and keyword lists.

It also has the option to download the British National Corpus word frequency list to use as a large comparative data set. This is a great tool for investigating keywords among small data sets.

Now, a number of commercial devices have this data-logging feature included as an option, providing a record of events over time. With the client’s consent, being able to track such usage can be invaluable in helping clinicians and educators see exactly what the client is currently capable of doing and, by extension, create teaching plans that will develop their ability to use the device. But if you are prepared to clean the raw data from an AAC device up a little, you can drop it into a concordance software and works some magic. You can see how a client’s use of vocabulary matches what you might expect; you can discover a clients keyword vocabulary by filtering out core words; and you can look at how client’s use vocabulary in context e.g. where do they use the word light and how is it being used.

In summary, what I’m suggesting is that using (a) large online corpora and (b) concordance software can enhance the way on which we develop and expand AAC systems, and that both of these are based on actual usage of language and not some hypothetical construct of what we think is happening with vocabulary.

Enough of the academic stuff; I just want to alert you to an unmissable experience at Tonic Bar & Grill on the corner at 971 Liberty Avenue, just outside the David L. Lawrence Convention Center. Those of a nervous gastronomic disposition may want to stop reading now – as may folks who are on any diet other than the “Let’s See How Fat I Can Get Before My Arteries Explode” diet.

At any time of day, you should prop yourself up at the bar, order one of their small selection of draught beers, and place an order for Poutine Fries [4]. This is a heavenly bowl of hot potato fries, smoothered in slippery, creamy cheese, and topped with a generous helping of tender braised short ribs. You can choose to experience this ambrosial feast either by eating it or having a cardiologist smear it directly on to your arteries: we recommend the former. How we managed to eat just one bowl is still a mystery to us but our hearts will undoubtedly thank us.

Poutine Fries

[1] As it was an early presentation, I skipped breakfast, which meant that by the time I’d finished I was hungry. So a shout out to the good folks at Bruegger’s Bagels on Grant Street in downtown Pittsburgh who supplied me with their Breakfast Bagel, a mouth-watering treat of egg, cheese, and bacon on a crusty whole-wheat bagel. I’m pretty sure it’s not the healthiest of starts to the day but it sure is one of the tastiest.

[2] Stubbs, M. (2010). Three concepts of keywords. In M. Bondi and M. Scott (Eds.) Keyness in Texts: Studies in Corpus Linguistics. John Benjamins Publishing: Philadelphia.

[3] Davies, M. (2008-) The Corpus of Contemporary American English: 425 million words, 1990-present. Available online at http://corpus.byu.edu/coca/.

[4] As our Canadian friend will know, Poutine Fries originated in Quebec and therefore represent a form of biological warfare against America, the intent being to bring the country to its knees by making everyone too fat to get up off of them. Rest assured that on their next trip to Montreal, the Dudes will make sure they take advantage of sampling the local Poutine Fries and would encourage anyone taking a trip to Canada to do the same!

The Dudes Do ISAAC 2012: Day 3 – On Dragons, Hammers, and Rooftops

Day 3 was really Day 1 in terms of the main conference, so I had to register like everyone else, even though I’d taught at the pre-conference session. I have to say that the signage at the David L. Lawrence Centre [1] for ISAAC 2012 needs to be commended. The place is huge but I was able to find my way to registration easily, and it only took about a minute from saying my name to leaving the counter.

ISAAC 2012 Registration

Welcome to ISAAC 2012

By sheer coincidence, standing right next to me was #slpeeps very own @bronwynah, who also doubles up as a member of #slpeeps-downunder. It seems that we #slpeeps get everywhere!

We were there to make sure we got our seats for the Opening Ceremony, which turned out to be a rather splendid event with the right amount of gravitas and humor. Guest speakers included Al Condeluci, a Pittsburgh native, born and bred, who has had a lifetime of working with United Cerebral Palsy and is a tireless promoter of rights for all. We also had Peter Yarrow, the singer from the iconic folk trio of the 60’s and 70’s, Peter,Paul & Mary. Yarrow has always been politically active and talked about his involvement in the recent “Occupy” protests. He equated the struggle faced by many people with disabilities as similar to those of African-American Civil Rights Movement in the 60’s and black South Africans led by Nelson Mandela in the 80’s.

Here he plays If I Had A Hammer accompanied by the ISAAC 2012 delegates.

Video courtesy of Teechkidz via YouTube.

Following the opening ceremonies there were a number of sessions during the afternoon, and the closing of the day was marked by the rooftop Welcome Reception on the North Terrace of the Convention Center. This was a ticket-only event based, presumably, on the fact that the number of folks registered would determine the amount of hors d’ouevres available. Tragically, there may have been some gatecrashers because by the time I actually got to the food tables, all that was left we some gazpacho shooters; tiny cups of cold soup that were fighting a losing battle with the 90+ degree temperatures in Pittsburgh.

I admit, there may have been some fault here. When I got to the roof, I saw that the bar line consisted of one person whereas the food line was somewhat longer. It was, after all, almost 6:00 p.m. and so an ice-cold beer was far too tempting to turn down. By the time I’d schmoozed and chatted with folks, the line for the food was still long and the beer line was also long. However, if you have friends at the front of the beer line…

So by the time I made it to the food line, at was gone. Except for the gazpacho. It was only through sheer chance that a colleague later managed to hunt down some brie and crackers, which served to stave off hunger until a group of us made our way to the unpretentious August Henry’s City Saloon on Penn Avenue. Here you can get a large White Russian for $7.00 and a dozen chicken wings for $10.00. Wing lovers might want to note that they offer a selection of dry rub wings in mesquite, ranch, and Caribbean Jerk flavors.

August Henry's City Saloon

August Henry’s

I spent a little time at the bar talking with an old colleague, Cliff Kushler, the inventor of T9 and Swype, about lexical disambiguation [2] and Swype Art. Swype Art, you ask? Well, those of you who use Swype as an input to a smart phone or tablet might like to try to trace out the word “infinity” and see what happens! Here’s a video that shows how the words “banana,” “infinity” and “circles” look on a Swype keyboard.

If you find any other words that have traced pictures that seem appropriate, be sure to share them with us

[1] David H. Lawrence was mayor of Pittsburgh from 1946 to 1959, who then went on to become the 37th Governor of Pennsylvania. He was born in 1889 and died in 1966.

[2] Lexical disambiguation is the process that is going on behind systems such as T9, Swype, SlideIt, TouchPal, and other keyboard accessing programs. The underlying software looks at the key you are selecting and then the subsequent keys, then “guesses” what you might be looking for. For example, if you start with an “B” and move across to “R,” you actually touch either the F or the G keys on the way, but because the system “knows” there are no words in English that start with “Bf” or “Bg” it ignores those an assumes you are looking for a word that begins with “br.” If you then swipe across to the letter “A,” it ignores your sliding across “D,” “S” or “W” because there are no words with “brd,” “brs” or “brw” in English. There are, of course, other factors involved but the basic notion of lexical disambiguation is relatively simple to understand.