Category Archives: Language

But Why Are Irregulars “Irregular?”

The English language is a glorious bastard child. Like the English themselves, its words and grammar are the result of the promiscuous and incestuous interbreeding that has been going on since the Angles, Saxons, and Jutes decided that they’d like an island vacation rather than sprawl out topless on the beaches of 5th century Europe – just as their current descendants do. Add to the mix the vocabulary of the Picts and Scots, along with a smattering of the ancient Welsh and Irish, and you’ve got yourself a language that turns out to be more wanton and debauched than a Roman orgy hosted by Caligula in a particularly creative mood.

As a result of this linguistic licentiousness, speech pathologists and English Language teachers find themselves having to teach a host of irregular, eccentric, and downright capricious words and grammatical structures. And there’s no finer example of this than something we call “the irregular verbs.” In fact, the very name “irregular verbs” tells us all we need to know; that here is a bunch of words so odd that we’ve just given up on them and tossed them into a huge bucket labeled “irregular.”

Irregular verbs cartoonOccasionally, you might hear the uncomfortable question…

“But Miss, Miss, Miss, why is it went and not goed? Why is it saw not seed? And why can’t I say taked instead of took?”

As pragmatists, 99% of us will just say, “Because it is” and then focus on the job at hand – teaching the exception to the rule. But 1% of us – and I count me as “one of us” – really does wonder “But why IS it went and saw and took rather than goed and seed and taked?” After all, when we invent a new verb such as to google or to tweet, it only takes a few weeks until folks have googled and tweeted or maybe even Facebooked. We know the rules; we apply the rules; we’re done!

Well, much as we all like to think we are hip, modern, trendy, and capable of being innovative game-changers who think outside of the box and shake up current thinking, as far as language goes, we’re tied to our undeniable linguistic history – the ghosts of the philological past are still haunting our etymological present. And like prehistoric flies trapped in amber, some of the words we use are really just fossils from an earlier age.

Back in the mid 1990s, Eva Grabowski and Dieter Mindt published a paper [1] that listed the most frequently used irregular verbs. They didn’t just sit in an office and google “most frequently used irregular verbs” but went back to basics and used the data from two pretty big (at the time) corpora: The BROWN corpus of American English [2], and the LOB corpus of British English [3]. Using real data rather than the “best guesses” of lexicographers was a huge step forward. For those of you who like FREE STUFF, you can click below to get a PDF copy of the top 100 irregular verbs by frequency. And why would you want it? Well, if you’re going to teach irregulars, starting with those used most makes a lot of sense.

Link to 100 most frequent irregular versb100 most frequently used irregular verbs

So let’s take the top of the list item, the verb to say, and crack open the amber to extract its etymological DNA.

Old English, and its Germanic predecessors, had more verb forms than modern English. Today, if you invent a new verb, such as to twerk, you only need to add three different endings to make it sound right: +s, +ing, or +ed.

“Miley can twerk, She twerks too much. Yesterday she twerked, I think she’s twerking too much.”

But Old English was a much tougher, with most verbs having around 14 different forms. And some verbs were strong while others were weak. It wasn’t that the strong ones would bully the weaker ones but the strong verbs would change their forms in a much more dramatic fashion than the punier weak ones. A strong verb would change its base form by muscling in new vowels. A commonly cited example of a strong verb is to sing, where you get sing, sang, and sung, with each form differing by the vowel [4]. Similarly ring, rang, and rung, or swim, swam, and swum. In contrast, the reason weak verbs are so-called is because they merely add an ending to their base form rather than man-up and ram those new vowels between the consonants.

I’m over-simplifying a little. There’s something of a sliding scale from “very strong” to “milquetoast weak,” and Old English scholars talk about 7 classes of strong verbs and 3 classes of weak ones. You have to think that with such a complex system, being a grammar teacher back in the 5th century CE must have paid more than it does today.

Having just explained the distinction between strong and weak verbs [5], take a look again at the verb to say. Is it strong or weak? Well, it’s so weak I’m surprised it hasn’t locked itself in a bathroom for fear of being hassled by to begin and to go! All that happens is that a /d/ sound gets added to the base form of /seɪ/, and the vowel changes ever-so-slightly by getting a tad shorter to leave /sɛd/. It’s technically from an Old English Class 3 weak verb that began life as secgan, meaning “to say,” and now has the pitiful pair of say/said left.

Number two on the list of irregulars, to make, is really pretty similar to to say, and so we should skip hastily on to the much more interesting to go, which has the disarmingly bizarre went as one of its forms. Why not, indeed, *goed?

Well, Old English did, in fact, have a *goedeode. But there was also another verb around in the 5th century that meant “to wander around or go slowly,” and that was wendan. You still hear people talk about “wending their way around” but other than that, the word wend is pretty rare. So between Old English and Middle English (that’s between the 5th and 15th centuries) the word oede got pushed out by wend, the past tense of wendan, and the devoicing of that final /d/ sound to a /t/ gave us the now-familiar went. For those who are geekily curious, this is called suppletion in the world of historical linguistics, and it’s where one word is used as the inflected form of another, but where both words come from different origins. Ever wonder why things go from bad to worse – or worst? Suppletion. Or why things go from good to better and best? Suppletion. Hey, it’s not just a verb thing!

Before I wind up this work and wend my weary way to bed, there’s one other question that might still be nagging at you; why is it these particular irregulars that are irregular and not others? Why say, and make, and go, and come, and take, and see…? It’s because of their frequency! When we started shifting from using those many different types of strong/weak verbs in Old English to the more relaxed syntax of “+s,” “+ing,” and “+ed,” the words that were used most  often had a built-in inertia – a resistance to change. We very easily – and perhaps it’s better to say unconsciously – take new verbs like tweet and twerk and add those three endings to them, but if we wanted to change went to goed [6] or see to seed, we’d have a harder time because it just sounds so wrong! So although we know that many new words are coined and used every day, there’s a core of  thousands of other words that are protected from change by a lexical inertia that anchors them firmly into our language and presents a formidable resistance to change.

So next time you’re focusing on teaching the irregulars, just remember that you’re also providing a small but fascinating lesson on the history of the English language!

Notes
[1] Grabowski, Eva & Mindt, Dieter (1995. “A corpus-based learning list of irregular verbs in English.” ICAME Journal 19, 5-22.

[2] Francis, W. Nelson, Kucera, Henry, & Mackie, Andrew W. (1982). Frequency analysis of English usage : lexicon and grammar. Boston: Houghton Mifflin.

[3] Hofland, Knut, & Johannson, Stig (1982). Word frequencies in British and American English. The Norwegian Computing Center for the Humanities, Bergen, Norway: Longman.

[4] Using vowel variation as a form of morphology is called ablaut. It’s from the German prefix ab- meaning “out of” or “away from” and laut meaning “sound.” So it refers to that notion of taking a sound away and replacing it with another.

[5] In today’s era of political correctness with the insistence on not hurting anyone’s feelings – ever, I can see the day coming when there will be pressure to re-define strong and weak verbs as robust versus relaxed. In that way, verbs like to chant and to hum no longer have to feel threatened by to sing.

[6] As every parent knows, kids will, in fact, quite happily “regularize” irregular forms when they are learning to talk. It is not unusual for kids to actually use irregular forms like went before they use regular, but erroneous, forms like goed. This overregularization is, in a sense, a good thing because it shows that a kiddo is learning to apply the more common rules of morphology – even if the words are technically wrong.

Errata
Thanks to eagle-eyed reader Mark Durham, we made a couple of corrections to the original text on 5/14/15; n two instances. we originally published tweak and here instead of twerk and hear. Both of these illustrate that relying solely on the built-in WordPress spell checker has some risks. It is, of course, better than not using it at all, but because both tweak and here are “good” words, the spell checker happily leaves them alone. So the teachable moment is “treat your spell checker as a friend who offers suggestions but not necessarily all the answers.”

The Dudes Do ATIA 2015: Day 2 – Of Powwows and Portmanteaus

The day before the Dudes left for the Assistive Technology Industry Association (ATIA) conference happened to be Lewis Carroll’s birthday. Folks who know me well – and maybe some who just happened to have heard me in presentations – will be painfully aware that I recommend Carroll’s Alice’s Adventures in Wonderland and Through the Looking Glass to anyone with the slightest interest in language. In fact, both books should be on the required reading list for all Educators and Speech and Language Therapists/Pathologists – seriously. Read the following single sentence as spoken by the Duchess in Wonderland and savor the complexity:

Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise.

Now parse it. There’s glory for you [1]. The books are just overflowing with words, phrases, and sentences that can provide enough material for several seminars on morphology, syntax, semantics, and pragmatics.

Time for a Powwow

Time for a Powwow

Coincidentally, or perhaps serendipitously, on the same day a Twitter colleague, @TactusTherapy, posted that she was about to take part in an appathon, which is clearly a blend of the words application and marathon. This is commonly referred to as a portmanteau word, a term first used by Carroll in Through the Looking Glass, when Humpty Dumpty is explaining what the words in the poem Jabberwocky mean:

“Well, slithy means ‘lithe and slimy.’ Lithe is the same as active. You see it’s like a portmanteau — there are two meanings packed up into one word.”

He then gives another example of a portmanteau with mimsy, which is a jamming together of miserable and flimsy. Linguists call these blends, or perhaps more specifically lexical blends – as distinct from, say, phonological blends where two or more sounds run together to end up as one. Other examples include positron (1933: positiveelectron); guesstimate (1936: guessestimate); skort (1951: skirtshorts); modem (1958: modulatordemodulator); metrosexual (1994: metropolitanhetero/homosexual); and hacktivist (1995: hackeractivist). My @TactusTherapy colleague also pointed out that she’d just come across a new portmanteau, listicle, to refer to one of those “5 Ways to Drive Your Lover Wild” or “10 Words Guaranteed to Get You a New Job” articles, where it’s basically a list modified into prose. Hence it’s a portmanteau of list and article.

ATIA15 Powwow 1

Moving ahead to Day 2 of the conference, I spent some time over lunch with a group of AAC/AT folks who had at some time attended one of the Pittsburgh AAC Language Seminar Series, or PALSS [2]. It’s a good excuse to get together with a group of like-minded folks for an informal powwow. Curiously enough, the word powwow (or pow-wow) may be another example of a portmanteau except from a non-English source. It can be traced back to the Narragansett language and pawwaw meaning a priest, shaman, or healer. It’s suggested that this in turn came from an earlier language, Proto-Algonquian [3], and the phrase *pawe-wa, which means “he who dreams.” The two words were blended into one by the elision of the middle syllable, and became the portmanteau, powwow.

During this powwow, yet another new portmanteau made its way into the discussion: the spamference. It’s clearly derived from spam and conference, and represents a relatively new concept in the field of academia – the junk conference. Basically, it’s a conference created not for the “free exchange of ideas and research from leaders in the field” but “a way of generating revenue for conference organizers by way of inviting folks to exotic and faraway places for a good time.” The typical invite goes along the lines of:

“Dear Speech Dude

As a recognized leader/expert/authority in the field of AAC/Linguistics/Toad Husbandry, our panel of professionals invite you to chair a session at our upcoming prestigious conference in Maui/Maldives/Vegas/Fiji (insert name of any place in which you’d love to spend a week).

As a conference chair, your registration fees will be discounted by 75% and hotel rooms by 25%. You will also be acknowledged as an Editor/Reviewer in the conference proceedings.”

And so on, and so on. The first hint of bogosity is the unsolicited nature of the invitation from someone who you’ve probably never heard of, and also that slightly hard-to-avoid-but-it’s-probably-true realization that you are maybe not quite the leader/expert/authority that you’d like to think you are!

Of course, if you want to beef up your resume and can get someone to fund you for your trip to Hawaii for “the conference,” then there’s nothing actually illegal going on here. Nothing. Like the whole “Open Access Journals” discussion – where you can get published so long as you stump up some cash – it’s a fundamentally grey area with advocates both for and against.

But spamference is definitely a portmanteau.

Notes
[1] This comes from a discussion between Alice and Humpty Dumpty in Through the Looking Glass about unbirthday presents. It ends with a classic definition of “the word” that’s beloved by linguists around the globe:

“There’s glory for you!’

`I don’t know what you mean by “glory,”‘ Alice said.

Humpty Dumpty smiled contemptuously. `Of course you don’t — till I tell you. I meant “there’s a nice knock-down argument for you!”‘

`But “glory” doesn’t mean “a nice knock-down argument,”‘ Alice objected.

`When I use a word,’ Humpty Dumpty said in rather a scornful tone, `it means just what I choose it to mean — neither more nor less.’

`The question is,’ said Alice, `whether you can make words mean so many different things.’

`The question is,’ said Humpty Dumpty, `which is to be master – – that’s all.’

See what I mean about great seminar material?

[2] The Pittsburgh AAC Language Seminar Series is a 2-and-a-half day event run by Semantic Compaction Systems in, no surprise, Pittsburgh. It’s focus is on implementing the Unity/Minspeak language system, with each seminar having a nationally recognized guest speaker. The seminars are monthly and registration is free but there are limited numbers – only 24 folks per seminar. It’s pretty cool because food and lodging is free AND you can get $150 towards your flight or mileage. Oh, and you get to meet me on Thursday morning – and that’s gotta be worth the trip! If you’re curious, here’s the link:
http://www.minspeak.com/PittsburghAACLanguageSeminarSeries.php

[3] A proto-language is one for which there is no direct evidence but can be (re)constructed, hypothesized or inferred on the basis of the structure and behavior of words that are verifiable. Algonquian is a genus of languages spoken primarily by Native American in north-eastern regions of North America, and Proto-Algonquian is thought to be the version spoken around 3,000 years ago. Here’s a link to a map of the family of Algonquian should you be curious – and if you’re still reading, you are ;) THE ALGONQUIAN FAMILY

Valentine’s? President’s? Whose Day IS It?

On a singularly dull day in Hell, when the screams of tortured souls no longer gave Lucifer a thrill, he came up with a new form of torture: the apostrophe [1]. It’s a brilliant piece of evil engineering because it takes up less than the merest dab of ink to pop it onto a piece of parchment, yet placing it in the wrong place can wreak maximum havoc on the sensibilities of gentle readers. And over-worked copy editors. It’s possible one of Satan’s most wickedly powerful dividers of nations ever invented.

Evil apostropheWithin the space of one week, we’re about to experience the full force of an apostrophe debate that will also generate more examples of that malevolent little mark all over the internet. February 14 and 16 are all set to become a grammatical confluence of biblical proportions. Perhaps.

Let’s start with the easier one: the case of St. Valentine and a celebration of card sales love. According to one version of the legend, St. Valentine was a priest who was martyred by the Roman emperor Claudius II for being a Christian, and for performing marriage rites. In one of the more lurid descriptions of his death, he was first stoned and clubbed but when that failed to kill him he was beheaded. I’m not sure that’s ever been part of a Valentine card illustration – though in the interest of accuracy, I think Hallmark need to consider it.

His performing of marriages seems to fit in with the idea of love, but oddly St. Valentine is also the patron saint of epilepsy, fainting,  plague, and bee keepers. Again, potential new avenues of exploration for the folks at American Greetings.

St Valentine

Can you look after these bees for me, Val?

When we celebrate St. Valentine, we do so on St. Valentine’s Day, where the apostrophe comes before that final “s.” Why? Well, it’s because one of the accepted norms for using an apostrophe is that you use it before a final “s” to indicate the notion of possession; the idea that the preceding succeeding noun belongs to the apostrophized previous thing. In this instance, this is a special day that “belongs” to St. Valentine. So you can have “the cat’s whiskers” because the whiskers belong to the cat; “the man’s coat,” because the coat belongs to the man; or “my brother’s wife,” because the wife belongs to my brother [2].

A second rule says that if you have more than one possessor, and the plural form ends with an “s,” you still put the apostrophe after the word but you can ignore a following “s.” Hence we can have “the dogs’ bone,” which is a bone shared by multiple canines; “the bishops’ fund,” which is a fund administered or used by a bench of bishops [3]; or “my brothers’ wives,” which is a clumsy way of referring to the collection of women owned by my brothers.

Valentine’s Day is, therefore, a pretty easy one. There is only one Valentine; it’s a day that is in some sense “owned” by him; so the apostrophe can happily nestle itself between the “e” and the “s” and copy editors can sleep at night. Sanity 1 – Satan 0.

But the Prince of Darkness is not yet done with us. He’s fully aware that although some folks will have trouble with Valentine’s Day, those who find it relatively easy have been lulled into a false sense of security. Lurking in the wings – or in this case, two days later – there is the day that even such luminaries as the Chicago Manual of Style (CMS) and the Associated Press Stylebook (AP) disagree on; Presidents Day or Presidents’ Day. Sanity 1 – Satan 1.

I know that our readers don’t come here to be subjected to stress, pain, or irritation (other than the mild form suffered when we say something outrageous or wrong) so let me take away any worries you’re having about which form to use here and now. The Associated Press Stylebook says “Presidents Day” with no apostrophe; the Chicago Manual of Style says “Presidents’ Day” with the apostrophe right at the very end. So the Dudes say; so go with the one you prefer!

DIfferent ways of spelling Presidents Day
Yes                                                    Yes                                                No

So why the confusion – apart from Beelzebub’s delight in watching us all squabble and bicker? It’s really because of the way that nouns can, in some circumstances, behave as if they were adjectives. Specifically, it’s a type of noun called an attributive noun, which sounds like another Mephistophelian invention. For the most part, nouns are pretty solid, stalwart parts-of-speech, happy to be just what they are – low-frequency, limited meanings. A dog‘s a dog, a cat‘s a cat, and that’s about it. However, sometimes a noun will have the urge to buddy up to another noun to make a compound, and the one that goes first can change its behavior and act, temporarily, like an adjective.

Here are some examples of attributive nouns, where the first noun is being used to enhance the meaning of the second:

football player: Just using the noun player on its own may not be sufficient, so adding the noun football helps specify the type of player. Similarly we could have a baseball player, hockey player, and so on.

business lunch: Again, lunch on its own is OK in a generic sense but if you’re having lunch for the purpose of discussion business-related issues, then adding business as an attributive tightens up the meaning.

apple tree: Fairly obvious and by now needs no explanation.

If you want to do a quick check as to whether you’re seeing an attributive noun or an attributive adjective, try the following test:

Change <WORD 1><WORD2> to “The <WORD2> is <WORD1>”: does it make sense?

“The player is football,” “The lunch is business” and “The tree is apple” sound wrong. But if we had “aggressive player,” “free lunch,” and “tall tree,” applying the test would result in sensible sentences, therefore they are attributive adjectives, not attributive nouns.

All of this brings us back to why Presidents/Presidents’ Day is a challenge. If it is a day that “belongs” to Presidents, then the apostrophe should be used to indicate possession and therefore needs to be included at the end of the word. But if it’s a day “about” or “for” Presidents [4], then it’s being used as an attributive noun descriptor to enhance the meaning of “day,” and so needs no apostrophe.

The distinction is fine, and so is the interpretation – hence the disagreement between CMS and AP. But it is an instructive example of how words can shift not only their meaning but function, and even a humble noun can aspire to adjectivehood!

Notes
[1] Apostrophe comes from the Greek ἡ ἀπόστροϕος meaning “of turning away, or elision.” Often the apostrophe is used to mark where something is missing (elided) such as in can’t for cannot, the poetic o’er for over, or singin’ as a colloquialism for singing. It’s this sense of “missing something” that gave rise to its name as a punctuation mark.

[2] You’re right to guess that I put that one in on purpose, knowing full well that it’s somewhat un-PC. I could, of course, have used “My sister’s husband” and explained it as “because the husband belongs to my sister,” but that wouldn’t be as forceful in showing how grammar and punctuation rules regarding “possession” don’t care for social norms. Doubtless there are folks out there who would be all for having us change the language so as to avoid that notion of “owning” someone but that’s not going to happen. Grammatical possession is a little different from social possession.

[3] The  most frequently cited collective noun for bishops is, indeed, a bench. Others include a sea of bishops and a psalter of bishops.

[4] The Presidents in question are apparently George Washington and Abraham Lincoln, whose birthdays are Feb 22 and Feb 17 respectively. I say “primarily” because there is also the notion that it is a celebration of all US Presidents, and that this extended meaning is accepted by many people.

Erratum
1. Eagle-eyed reader, Trish, pointed out I used preceding rather than succeeding in the original sentence. Whoops!

28 Words to Boost Your Client’s Vocabulary – Maximum Bang for Buck

When developing a vocabulary set for an augmented and alternative communication (AAC) system – or indeed when deciding on what vocabulary to teach anyone – one of the most fundamental of measures you can use is frequency count; how often is a word used in a language? No-one can predict with 100% accuracy which words will be “best” for an individual, but if you’re going to take bets, you’re pretty safe to assume that words such as that, want, stop, and what are going to be used by everyone from ages 2 to 200. By the same token, you’d not be missing much if you didn’t spend too much time on words like ambidextrous, decalogue, and postilion [1].

In the field of AAC, this type of high frequency vocabulary that is used (a) across populations and (b) across situations is referred to as core vocabulary and it’s often contrasted with the phrase fringe vocabulary, which refers to words that are typically (a) low in frequency and (b) specific to isolated activities or situations. For a refresher on core and fringe – and an introduction to keyword vocabulary – check out my article entitled Small Object of Desire: The Monteverde Invincia Stylus fountain pen – and Keyword Vocabulary from two years ago.

The core/fringe distinction is now so embedded in the world of augmentative communication that it is rare to see any new app appear on the market that doesn’t use the phrase “core vocabulary” somewhere in its marketing blurb – even if it isn’t actually making good use of the core! And as core vocabulary is, by definition, common across ages, activities, situations, and pathologies, it’s not surprising that many AAC software offerings look the same, particularly with regard to the words being encoded [2].

But it’s worth taking a look at another level of frequency measurement, and that’s at the phrase level. Specifically, one area of research that seems to me to offer some value to Speech and Language pathologists and Educators working in vocabulary development is in the study of how phrasal verbs (PVs) are distributed.

PV 3

So what’s a phrasal verb? Well, simply put, it’s a phrase of two to three words that are yoked together, which include a verb and a preposition and/or adverb. Examples include, “I ran into Gretchen at the ATIA conference,” “I backed up my hard drive,” and “I came across an interesting article on phrasal verbs.” The English language is stuffed to the gills with these type of verbs, and a feature of them is that they tend to have multiple meanings.

To find out how polysemous a phrase can be, you can use the excellent WordNet online tool, a huge database of words and phrases that let you check out noun, verb, adjective, and adverb meanings. For example, would you believe that the simple phrase “give up” has 12 different meanings? Or that “put down” has 8 variations? It’s not surprising that learners of English find phrasal verbs quite challenging.

The other fascinating feature of phrasal verbs is summarized in a 2007 paper by Gardner and Davies, who point out that of you look at the 100 million word British National Corpus you find that;

…a small subset of 20 lexical verbs combines with eight adverbial particles (160 combinations) to account for more than one half of the 518,923 phrasal verb occurrences identified in the megacorpus. A more specific analysis indicates that only 25 phrasal verbs account for nearly one-third of all phrasal-verb occurrences in the British National Corpus, and 100 phrasal verbs account for more than one half of all such items. Subsequent semantic analyses show that these 100 high-frequency phrasal verb forms have potentially 559 variant meaning senses.

Read that again and see if you get the same tingle I did seeing those numbers. Over half the entire phrasal verbs found in the corpus can be accounted for by combining 20 verbs with 8 particles. In short, if you learn just 28 words, you’ve learned 50% of all the phrasal verbs you’ll need to use.

Let’s take a look at those Top 2o verbs first:

20 most frequent verbs in phrasal verbs

Table 1: Top 20 Verbs in PVs

And now the Top 8 particles:

Eight most frequently used particles in phrasal verbs

Table 2: Top 8 particles in PVs

All the verbs and prepositions as individual items are already high frequency, with the exception of perhaps the verbs point and set, which wouldn’t be on my list of “first words to teach.” However, the real bonus here is that not only do you get the benefit of teaching your client 28 high frequency words in isolation but if you then use them as phrasal verbs, your “bang for buck” is significant!

This frequency analysis of phrasal verbs by Gardner and Davies has recently been supported by and extended upon by Dilin Liu (2011) and by Mélodie Garnier and Norbert Schmitt [3] (2014). In their paper, The PHaVE List: A pedagogical list of phrasal verbs and their most frequent meaning senses, they point out that a limitation in Gardner and Davies’ analysis is that they failed to take into account the polysemy inherent in the phrases – like the 12 meanings of “give up.” In fairness to Gardner and Davies, they did, in fact, talk about the polysemous nature of PVs but didn’t offer any measure of the different frequencies with which the various meanings are used. They wrote that:

For instance, the list-high 19 senses of the PV break up … could be arranged from highest to lowest semantic frequency, thus prioritizing them for language learning. We acknowledge, however, that corpora of this nature are much easier talked about than constructed. (p.353).

Garnier and Schmitt are interested not just in identifying the frequency with which a phrasal verb occurs but also the most common senses of those PVs. They say that;

…our main purpose for creating the PHaVE List, which is to reduce the total number of meaning senses to be acquired to a manageable number based on frequency criteria.

On a pragmatic level, they want a learner not to have to learn every meaning of each PV but just focus on the most frequent, and therefore most useful meanings. Using the original list from Gardner and Davies, along with additions by Liu (2011), and including data from the Corpus of Contemporary American English (Davies, 2008), the duo created the PHaVE List; a list of the 150 most frequently used phrasal verbs, and 280 of the most frequently used meanings. So on the 12 potential meanings for “give up,” they use the following:

16. GIVE UP
Stop doing or having something; abandon (activity, belief, possession) (80.5%)
Example: She had to give up smoking when she got pregnant.

The general entry starts with a rank (in this case, 16th out of 150); the basic phrasal verb; a definition; a percentage frequency; and a specific example use. The complete list is made available as a download from the Sage journals website [4]. If you can get access to it, it is well worth the read and the download. And all the articles referenced in this article are good examples of how we can use corpus linguistics to help guide our practice of developing the vocabulary of our clients with language challenges.

References
Davies, M. (2008-). The Corpus of Contemporary American English: 425 million words, 1990-present. Available from Brigham Young University The Corpus of Contemporary America English, from Brigham Young University http://corpus.byu.edu/coca

Gardner, D., & Davies, M. (2007). Pointing Out Frequent Phrasal Verbs: A Corpus-Based Analysis. TESOL Quarterly, 41(2), 339-359.

Garnier, M., & Schmitt, N. (2014). The PHaVE List: A pedagogical list of phrasal verbs and their most frequent meaning senses. Language Teaching Research, 1-22.Published online before print http://ltr.sagepub.com/content/early/2014/12/08/1362168814559798.abstract

Liu, D. (2011). The Most Frequently Used English Phrasal Verbs in American and British English: A Multicorpus Examination. TESOL Quarterly, 45(4), 661-688.

Notes
[1] A postilion is the driver of a horse-drawn carriage, who sits posterior to the horses. The sentence “The postilion has been struck by lightning” is the basis of a wonderful little paper by the linguist David Crystal, published in 1995 in the journal Child Language Teaching & Therapy. Simply titled “Postilion Sentences,” Crystal defines a postilion sentence as “one which has little or no chance of ever being useful in real life. It could be used, obviously, because it is grammatically well-formed; but the contexts in which it would be natural to use it are either so restricted or so adult that the chances of a child encountering it, or finding it necessary to use it, are remote.” In the design of AAC systems, using pre-stored sentences may have some limited value but many “pragmatic utterances” turn out to be nothing more than postilions; unlikely to be used. This is why teaching sentences is neither language nor therapy.Download Postilion sentences article

[2] The now-common practice of using core vocabulary also makes it much harder to prove plagiarism – or as we Lancastrians would say, “nicking someone else’s ideas.” People, of course, don’t “steal” ideas – they are “inspired” by the work of others. But such inspiration inevitably leads to systems appearing almost clone-like in their structure. It’s only when you get to the fine details of how words are organized and encoded that you can separate the wheat from the chaff. And there’s a lot of chaff out there.

[3] If I haven’t mentioned it before, Norbert is the author of an excellent book on vocabulary research methods. Here’s the full reference: Schmitt, N. (2010). Researching vocabulary : a vocabulary research manual. Houndmills, Basingstoke, Hampshire ; New York, NY: Palgrave Macmillan. It’s full of useful information and lots of web links worth exploring, and worth the $30 you’ll spend on Amazon US – or the £20.99 in the UK.

[4] Just a reminder to all members of the Royal College of Speech and Language Therapists that you membership benefits includes access to a number of Sage journals online, and Language Teaching Research is one of those. In fact, you have access to over 700 (yes, count ’em!) titles, including my personal favorites Child Language Teaching and Therapy, Clinical Linguistics & Phonetics, English Today, and the riveting Scandinavian Journal of Occupational Therapy. OK, so I lied about the last one being a “favorite” :)

Fewer Hassles Means Less Hassle

There are two types of people standing in a supermarket check-out line; those who use the “10 items or less” aisle and worry about how many things they have in their trolley [1] ,and those who want to use a thick red marker pen to scribble out the word “less” and write “FEWER!!!” in large, capital letters.

We need fewer mistakes

We need fewer mistakes: CC license from Flicker

As a long-time sufferer of prescriptivism – that terrible affliction where you can’t help feeling that there is a right and a wrong way to use a language – I have to admit I’m getting better at ignoring such things and adopt a Zen-like calm at the checkout, murmuring internal mantras to keep my blood pressure down. The trick is to take a little time analyzing just why “10 items or less” can be seen as “wrong.” And it’s all to do with the nature of nouns and counting.

When it comes to nouns, one of the ways to categorize them is as either count or mass nouns. A count noun is one that – not surprisingly – can be counted. You can have one button or two buttons; one banana or three bananas; one mongoose or 24 mongeese. OK, so that last one was a lie – it’s mongooses[2]. The point is that the noun in question can be viewed as a discrete item and quantified.

Mongooses - or mongees? CC license from Arpingstone.

Mongooses – or mongees? CC license from Arpingstone.

The contrary is a mass noun, which refers to a thing that can vary in terms of quantity but you can’t really count it. You can have salt and then more salt; water and then more water; fun and then more fun.

Morphologically, count nouns typically add an –s to the end of the singular form of the noun whereas mass nouns stay the same. Some count nouns have irregular plural forms – hence the goose/geese distinction mentioned as few sentences ago – and a few don’t change at all, such as sheep, deer, and moose.

Now, just to make things more interesting, some adjectives that are used to pre-modify nouns don’t work with both mass and count nouns. This is the case with fewer and less, where the former works better with count nouns whereas the latter typically partners up with mass nouns. So you have less salt, less water, and less fun but fewer buttons, bananas, and mongooses.

Mass nouns can, in some situations, defect to the count noun camp, usually when the mass is in some way chopped up into smaller pieces. So if you have water poured into glasses, it’s perfectly normal to say things like, “There were several waters in the table.” Similarly, when visiting a bakery you might say, “There were lots of breads to choose from.” But in both these cases, the “countiness” is due to the fact that the mass has been artificially quantized.

Unlike words such as bread and water, which seem to spend most of their lives being mass nouns, or dog and cat, which sit squarely in the count corner, words like hassle appear to swing happily back and forth between mass and count. Thus having fewer hassles can, indeed, lead to less hassle. More specifically, the first hassle is in its count form whereas the second is the uncountable mass version. If you think about it, you can talk about having “lots of hassles” because you can in theory count each individual hassle, but if you’re talking about hassle in general, it’s a more amorphous mass of “hassle” so uncountable. If I suggested replacing that second hassle with harassment, the mass element becomes more obvious.

Now you get an idea why we poor prescriptivists suffer from bouts of toe curling when seeing “10 items or less.” It’s that the noun items is clearly a count noun (it takes an -s plural and is preceded by a number) but less is reserved for mass nouns.

Notes
[1] There’s a sub-group here of sociopaths who either cannot count and so trundle through with a cart positively overflowing with stuff, or are so egocentric and narcissistic that they couldn’t care less how many items they have – they just want the shortest and faster line so they can get on with their terribly important and self-centered lives. In the world of self-carry laws for gun owners, it’s a surprise that there are so few gunfights at the Walmart corral.

[2] We like to think that the average Speech Dudes reader is not, in fact, average, and is more curious than a clowder of cats, and as such, may ask the obvious question as to precisely why isn’t the plural of mongoose mongeese? It’s because the word mongoose doesn’t actually have anything to do with the word goose in the first place. If actually derives from the Portuguese mangus, which in turn is from the Indian dialect Marathi word mungus, and then ultimately the Dravidian language Telagu and the word mungisa. Any tendency to use mongeese therefore comes from mistakenly assuming it’s a derivative of goose, which comes from Old English gos and can be traced back to Old Aryan *ghans.

All I Needed to Know About Adjectives I Learned at Starbucks

Language is an example of a moving target par excellence. Only today, I received a tweet that outlined a number of reasons why you should instantly wife your girlfriend. Wife her, I thought? Since when did wife switch teams and become a verb? Well, truth be told, it turns out that it became a verb in 1387, as evidenced by a quote from that popular 14th century pot-boiler Prolicionycion wrtten by Ranulf Higden:

Þey..kepeþ besiliche here children, and suffreth hem nouȝt to wyfe wiþ ynne foure and twenty ȝere.

But for reasons unknown – as is often the case in etymology – the use of wife as a verb disappeared sometime during the early 18th century, leaving only the noun usage in common use [1]. After a brief dalliance with verbiness, the word settled back into its original home.

Let’s now go back to just last week during the 2014 Closing the Gap conference in Minneapolis. After standing in line for almost 15 minutes to get a Starbucks latte from the hotel’s coffee bar, I asked for a “tall skinny” and was then quizzed with, “Is that the short tall?”

A “short tall?” Dear Lord, how much more torture do we want to subject the English language to? Prescriptivists everywhere would be wailing in anguish and putting red pens to paper – or maybe tweeting their disgust in 140 characters or less!

However, it’s pretty clear what’s happening here. Just like wife in the 14th century, the word tall is getting bored with being a simple adjective and deciding that being a rambunctious noun is much better; “Noun Envy” as the psychoanalinguists might say [2].

Starbucks, for purposes of marketing and not linguistics, decided to ignore the more semantically accurate method of labeling coffee sizes by “small,” “medium,” “large,” and “freakin’ huge,” in favor of “tall,” “grande,” “venti” and “trenta.” But they created an element of cognitive dissonance in consumers’ minds by linking a word like tall, which is semantically typically opposed to short, with the word small, which is more likely to be balanced against large. So using a word like tall to describe something that is cognitively small just doesn’t jibe.

What our consciously unaware but unconsciously linguistic barista has done here is to overcome that dissonance by treating the word tall as a noun and using short as an attributive adjective. Pretty damn cool, eh? [3] I can easily imagine that at some point, various baristas [4] have uttered not only “Is that the small tall?” but also “Do you mean a medium grande?” or “Is that a large venti?”

So while I’m hanging out here with you all in our virtual Starbucks, something else you might be curious about is the whole “How do I order my coffee?” issue. Does one ask for “a skinny grande cappuccino” or “a grande skinny cappuccino?” And when you start adding caramel or extra shots, where on earth do  you hang them?

Well, having castigated my good friends at Starbucks in relation to their idiosyncratic naming of drink sizes, I’ll offer them points for actually providing a “syntax” for budding baristas in order to make ordering easier. In a 2003 manual distributed to employees, the following generic ordering structure was recommended:

1. CUP: That’s hot, cold, iced, or “for here.”
2. SHOT and SIZE: No stipulation for which should be first.
3. SYRUP: For your caramel, raspberry, cinnamon etc.
4. MILK: Skimmed, 2%, soy, or whatever.
5. DRINK: Coffee, tea, mocha, or any other name.

My personal common order is for a “grande, non-fat latte,” which fits the rules of 2>4>5. During summer, I might order an “iced, grande, non-fat latte,” which again conforms with 1>2>4>5. My wife has a “grande non-fat, caramel macchiato” that follows the rules, and sometimes goes for the “iced, grande non-fat caramel macchiato,” which illustrates the full-blown 1>2>3>4>5 ordering.

Budding researchers [5] might want to spend an afternoon at their local Starbies armed with a pen and a notebook, jotting down as many orders as they can overhear – what researchers like to call “taking a sample.” After an hour of sampling both orders and coffee, they should be able to do some analysis to see how many people actually conform to the ordering paradigm. Remember, this is what research is all about; setting up a hypothesis about how we think folks will order coffee, and then testing it against observations of how they really order it!

Outside the world of Starbucks, adjective ordering in English also has some rules. One of the most common ordering paradigms is as follows:

Order of adjectives

If we compare this with the Starbucks recommendations, we can see that the sequence CUP-SHOT/SIZE-SYRUP-MILK-DRINK corresponds to the generic OPINION-SIZE-MATERIAL-QUALIFIER-NOUN. So they’re pretty much on the syntactic ball here!

Doubtless our hundreds of “proxy Dudes” collecting real data at coffee bars across the world will find exceptions to the ordering rules, but language performance has always been variable. On the other hand, we’re unlikely to hear “macchiato iced grande caramel” or “caramel latte venti soy.”

Or are we?

Notes
[1] I suppose as a proponent of using evidence and data to support propositions, I did take a look at the Corpus of Contemporary American and found no instances of wife as a verb in the 450 million word sample. Same for the British National Corpus (100 million word sample) and the Canadian Strathy Corpus (50 million words). Of course, absence of evidence is not evidence of absence, but I think I’m pretty confident in asserting that using wife as a verb is extremely rare and unlikely.

[2] Don’t rush out to your dictionary – even if YOUR dictionary is the Urban Dictionary – to find the word psychoanalinguist. It doesn’t exist. It’s only a “real word” in the sense that (a) I have just used it and (b) it can be understood within the context of this article.

[3] I suppose I need to appreciate that not everyone gets as excited about language change as I do. But this type of living example of how new meanings come about helps us all understand how important it is to be aware of the simple fact that languages are not, and never have been, static. I’m not suggesting that we allow some form of lexical anarchy where you can simply stick any old word anywhere but knowing that words can, and do, change meaning and category can, I believe, make us more aware clinicians.

[4] The word barista is, as you might know, Italian, so you might be tempted to point out to me that I should really be using the word baristi to mark the plural. However, the word baristas is perfectly acceptable because it’s an example of a word that’s been Anglicized i.e. taken into the English language, and the normal rule for making a plural word is simply to add an “s.” Hence baristas. I think I’ve talked about this before in relation to octopuses as being a wonderful plural, with octopi being fake Latin (octopus comes from Greek, not Latin, and if you wanted a Greek plural, it would really be octopodes!)

[5] It strikes me that a generous supervisor might be totally OK with letting a grad student work on a study such as, “Syntactic adjectival variability in coffee ordering.” And should that student be the recipient of a grant from Starbucks itself, it seems a bit of a no-brainer, don’t yah think?

ColorBrewer: Utilizing cartography software for color coding

It seems that I am getting a reputation for being a teensy-weensy bit doryphoric [1] and that may have some truth in it insofar as I hate – with a passion – the tendency for people to use the word “utilize” rather than “use” simply because the former sounds more erudite. It’s not, in fact, erudite; it’s just plain wrong. As I’ve said in previous posts, “utilize” means “to use something in a manner for which it was not intended.” So I can “use” a paper clip to hold a set of pages together; but I can “utilize” it to scoop wax out of my ears or stab a cocktail olive in my vodka martini (shaken, not stirred).

Colorado beetle

Doryphoric

So when I titled this post with “utilizing cartography software” I really do mean that and I’m not trying to sound clever by using a four-syllable word (utilizing) over the simpler two-syllable using. No siree, I say what I mean and I mean what I say: utilize. The software in question is online at ColorBrewer: Color Advice for Cartography and its original purpose was to help map makers choose colors that provide maximum contrast. Let’s create an example. Suppose you have a map of the US and you want to use colors to show the average temperatures as three data sets; below 50F, 51F-65F, and above 65F. You can use three colors in one of three different ways:

  • (a) Sequential: Three shades of a chosen color from light to dark to indicate low to high values. e.g. Sequential color
  • (b) Diverging: Three colors that split the data equally in terms of the difference between the colors, but with the mid-range being related to a degree of difference between the extremes. Divergent color coding
  • (c) Qualitative: Three colors that split the data into three distinct groups, such as apples/oranges/bananas or trains/boats/planes – or for the statisticians out there, any nominal level data. Color coding qualitative

For a map of temperature averages, you’d choose the sequential coding so as to show the degree of change. Here’s what such a map might look like:

Three data point colors

Three data point colors

Compare this with a version whereby we chose to have six data points rather than three i.e. less that 45F; 46F-50F; 51F-55F; 56F-60F; 60F-65F; above 65F.

Six data points colors

Six data points colors

What the software does that is interesting is that it automatically generates the colors such that they are split into “chromatic chunks” that are equally different. The lowest and highest color values for each map are the same but the shades of the intermediate colors are changed. If you were to choose a set of 10 data points, the software would split those up equally.

Of course, as the number of data points increases, the perceptual difference between them decreases i.e. it becomes harder to see a difference. This is one of the limitations of any color-coding system; the more data differentials you want to show, the less useful colors become. You then have to introduce another way of differentiating – such as shapes. So if you had 20 shades of gray, it’s hard to see difference, but with 20 shades of gray and squares, triangles, rectangles and circles, you now have only 5 color points for each shape.

One of the areas where color coding is used in Speech and Language Pathology is AAC and symbols. In the system of which I am an author [2] color coding is used to mark parts of speech. But suppose you were going to invent a new AAC system and wanted to work out a color coding scheme, how might you utilize the ColorBrewer website?

If you’re going to design your system using a syntactic approach (and I highly recommend you do that because that’s how language works!) you could first identify a color set for the traditional parts of speech; VERB, NOUN, ADJECTIVE, ADVERB, PRONOUN, CONJUNCTION, PREPOSITION, and INTERJECTION [3]. This looks suspiciously like a nominal data set, which corresponds to the Qualitative coding method mentioned at (c) above. So you go to the ColorBrewer site and take a look at the panel in the top left:

ColorBrewer Panel

ColorBrewer Panel

You can set the Number of data classes to 8, the Nature of your data to qualitative, and then pick yourself a color scheme. If you chose the one in the graphic above, you see the following set recommended:

Eight color data setFor the sake of completeness, here are all the other options:

ColorDataQualSet2You can now choose one of these sets knowing that the individual colors have been generated to optimize chromatic differences.

So let’s assume we go for that very first one that starts with the green with the HTML color code #7FC97F [4]. I’m going to suggest that we then use this for the VERB group and that any graphics related to verbs will be green. Now I can move to step 2 in the process.

Verbs can actually be graded in relation to morphological inflection. There are a limited number of endings; -s, -ing, -ed, and -en. Knowing this, I can go back to the ColorBrewer site and use the sequential setting to get a selection of possible greens. This time I changed the Number of data classes to 5 and the Nature of your data to Sequential. Here’s what then see as a suggested set of equally chromatically spaced greens:


ColorDataQualPanel2

This now gives me the option to code not just verbs but verb inflections, while chromatically signaling “verbiness” by green. Here’s a symbol set for walk and write that uses the sequential – or graded – color coding:

Color-coded symbols

Color-coded symbols

If you want an exercise in AAC system design, knowing that ADJECTIVES also inflect like verbs using two inflectional suffixes, -er and -est, you can try using the ColorBrewer to create color codes [5].

There are probably many other ways to utilize the site for generating color codes. For example, you might want to create colors for Place of Articulation when using pictures for artic/phonology work, and seeing as there are a discrete number of places, it should be easy enough. Why not grab yourself a coffee and hop on over the ColorBrewer now and play. But only use it if you’re creating a map. Please!

Notes
[1] A doryphore is defined by the OED as “A person who draws attention to the minor errors made by others, esp. in a pestering manner; a pedantic gadfly.” It comes from the Greek δορυϕόρος, which means “spear carrier,” and it was originally used in the US as a name for the Colorado beetle – a notable pest. This beetle was known as “the ten-striped spearman,” hence the allusion to a spear carrier.  To then take the noun and turn it into an adjective by adding the -ic suffix meaning “to have the nature of” was a piece of cake – and a great example of using affixation to change a word’s part of speech. As always, you leave a Speech Dudes’ post far smarter than you entered it!

[2] Way back in 1993 I was invited to join the Prentke Romich Company’s R&D department as one of a team of six who were tasked with developing what became the Unity language program. The same basic program is still used in PRC devices and the language structure has been maintained such that anyone who used it in 1996 could still use it in 2014 on the latest, greatest hardware. The vocabulary also uses color coding to mark out Parts-of-Speech but not exactly like I have suggested in this article. Maybe next time…?

[3] The notion of 8 Parts-of-Speech (POS) is common in language teaching but as with many aspects of English, it’s not 100% perfect. For example, words like the, a, and an can be categorized under Adjectives or added to a class of their own called Articles or – by adding a few more – Determiners. So you might see some sources talking about 9 Parts-of-Speech, and I like to treat these as separate from adjectives if only because they seem to behave significantly differently from a “typical” adjective. Another confounding factor is that some words can skip happily between the POS and create minor havoc; light is a great example of this. The take-away from this is that sometimes, words don’t always fit into neat little slots and you need to think about where best to put them and how best to teach them.

[4] In the world of web sites, colors are handled in code by giving them a value in hexadecimal numbers – that’s numbers using base 16 rather than the familiar base 10 of regular numbers. Black is #000000 and white is #FFFFFF. When you’re working on designing web pages, it’s sometimes useful to be able to tell a programmer that you want a specific color, and if you can give them the precise hex code – such as #FF0000 for red and #0000FF for blue – then it makes their job easier and you get exactly what you need. You can also something called RGB codes to described colors, based on the way in which the colors (R)ed, (G)reen and (B)lue are mixed on a screen. Purple, for example, is (128,0,128) and yellow is (255, 255, 0). Take a look at this Color Codes page for more details and the chance to play with a color picker.

[5] I suppose I should toss in a disclaimer here that I’m not suggesting that creating an AAC system is “simply” a matter of collecting a lot of pictures with colored outlines and then dropping them into a piece of technology. There is much more to it than that (ask me about navigation next time you see me at a conference) so consider this article just one slice of a huge pie.

Countdown to Christmas – Question 24: Christmas Eve!

OK folks, that’s it – there is no more! Our virtual advent calendar ends today, leaving you all to open that magical 25th door tomorrow, where – when I was a kid – you’d find a piece of Cadbury chocolate and a picture of the baby Jesus in a straw-lined trough.

So as we come to the end of our super-fabulous coffee-giveaway extravaganza, our last question is also about last things. Coming up right after this video of Steely Dan’s “The Last Mall” from their Everything Must Go album.

A syllable is usually defined a having three distinct segments; the ONSET, the NUCLEUS, and the… what?

ANSWER: Coda!

A few folks offered RIME (or RHYME) as the solution, and in fairness, we should acknowledge that this might be OK. However, when one talks about the three segments that have ONSET and NUCLEUS as the first two, the third is CODA. In the two-part description, one does indeed see ONSET and RIME, but the rime is defined as consisting of the NUCLEUS + CODA, or, in an open syllable, the CODE is absent. So, coda is what we wanted, which also fits in with the idea that this is the “end” of the contest – and coda means “end.”

Syllable structureLinks

The Syllable and the Foot from Macquarie University: nice overview.

Explore syllable structures across languages at the World Atlas of Language Structures online.

 

Countdown to Christmas – Question 22: Sunday 22nd December

What was the name of the Speech Therapist who worked with the Aflac duck during 2013, as part of the rehab team nursing him back to health after a tragic accident?

(a) Angela Webster

(b) Allison Weber

(c) Andrea Westinghouse

(d) Amanda West

ANSWER: Allison Weber.

Played by Atlanta actress Jammie Patton, Allison Weber is part of a multi-disciplinary rehabilitation team dedicated to bringing the Aflac duck back to full health.

Links

Actress Jammie Patton talks about working with “duck royalty.”

Countdown to Christmas – Question 21: Saturday 21st December

You are asked to evaluate a client who has had a stroke. Which one of the following tests is most appropriate?

(a) BDAE-3

(b) BDI-2

(c) BLT-2

(d) BTAIS-2

Therapy interview

ANSWER: BDAE-3: The Boston Diagnostic Aphasia Examination – Third Edition.

The BDAE has been around for some time now – one of the Dudes was using it in the 1980’s! – and it’s now in its third edition. It’s designed to determine and distinguish disorders of language function and neurologically recognized aphasic syndromes.  The test contains a short form for rapid access to diagnostic classification and quantitative assessment.

The BDI-2 is the Batelle Development Inventory and screens, diagnoses, and evaluates children from infancy to age 8. Domains include personal-social, adaptive, motor, communication, and cognitive.

The BLT is the Bankson Language Test for kids aged 3:00 to 7:00. It aims to measure children’s psycholinguistic skills in the three general categories of semantic knowledge, morphological/syntactical rules, and pragmatics. Not to be confused with the sandwich of the same name!

The BTAIS-2 is the Birth to Three Assessment and Intervention System, which screens language comprehension and expression, nonverbal thinking, and motor development.

Links

The Directory of Speech-Language Pathology Assessments collated by ASHA.

The BDAE from PsychCorp, a part of Pearson Education, Inc.