How can corpora help in language pedagogy?


How can corpora help in language pedagogy?Richard XiaoAbstractCorpus linguistics as a methodology of linguistic research has gained such prominence ov

ver time that corpora have been used extensively in nearly all branches of linguistics. This chapter explores the potential uses of corpus data in one

of these areas - language teaching and learning. We will first discuss a wide range of issues related to using corpora in language pedagogy, includin

g referencing publishing, syllabus design and materials development, language testing, teacher development, data-driven learner (DLL), leaching langua


ge for specific purposes, as well as learner corpus and interlanguage analysis. We will then demonstrate, via a case study of passive constructions in

debate over the relevance of authenticity and frequency of corpora in language education as well as the future of corpus-based language pedagogy.Key w

ords: corpora, language pedagogy, data-driven learning, learner corpus, contrastive corpus linguistics, interlanguage, second language acquisition1. I

ntroductionThe corpus-based approach to linguistics and language education has gained prominence over the past four decades, particularly since the mi


mid-1980s. This is because corpus analysis can be illuminating 'in virtually all branches of linguistics or language learning' (

intuitions of a great number of speakers and makes linguistic analysis more objective (McEnery and Wilson 2001: 103). Unsurprisingly, corpora have be

en used extensively in nearly all branches of linguistics including, for example, lexicographic and lexical studies, grammatical studies, language var

iation studies, contrastive and translation studies, diachronic studies, semantics, pragmatics, stylistics, sociolinguistics, discourse analysis, fore


nsic linguistics, and language pedagogy. Corpora have won widespread popularity over time in spite of the fact that they still occasionally attract ho

is and language education. In our view, such a debate is over a nonissue. Readers interested in the pros and cons of using corpus data should refer to

Sinclair (1991), Widdowson (1991, 2000), de Beaugrande (2001) and Stubbs (2001). Robert de Beaugrande's unpublished paper, 'Large corpora and applied

linguistics: H. G. Widdowson versus J. McH. Sinclair’ (available online at, provides an excellent summary


of the debate between Sinclair and Widdowson, at the Georgetown University Round Table on Languages and Linguistics in 1991, over the use of corpora i

r negative) reactions to corpus data between the two extremes. Readers can refer to Nelson (2000: section 5.3.3.) for a good review. Nor will we discu

ss the use of corpora in a wide range of language studies. Readers can refer to Hunston (2002) and McEnery, Xiao and Tono (200

6) for a further discussion of using corpora in applied linguistics. Instead, this chapter focuses only on using corpora in language pedagogy.The earl


y 1990s saw an increasing interest in applying the findings of corpus-based research to language pedagogy. The upsurge of interest is evidenced by the

000), Bertinoro (2002), Granada (2004), Paris (2006), and Lisbon (2008). This is also apparent when one looks at the published literature. In addition

to a large number of journal articles, well over twenty authored or edited volumes have recently been produced on the topic of teaching and language

corpora: Wichmann et al (1997), Partington (1998), Bernardini (2000), Burnard and McEnery (2000), Kettemann and Marko (2002, 2006), Aston (2001), Ghad


essy, Henry, and Roseberry (2001), Hunston (2002), Granger et al (2002). Connor and Upton (2002), Tan (2002), Sinclair (2003, 2004), Aston et al (2004

ntana (2007), O'Keeffe, McCarthy and Carter (2007), Aijiner (2009), and Campoy, Gea-valor and Belles-Fortuno (2010). These works cover a wide range of

issues related to using corpora in language pedagogy, e.g. corpus-based language description, corpus analysis in classroom, and learner corpora (cf.

Keck 2004).In the opening chapter of Teaching and Language Corpora (Wichmann et al 1997), Geoffrey Leech observed that a convergence between teaching


and language corpora was apparent. That convergence has three focuses, as noted by Leech (1997): the direct use of corpora in teaching (teaching about

sting), and further teaching-oriented corpus development (LSP corpora, LI developmental corpora and L2 learner corpora).In the remainder of this chapt

er, we will explore the potential uses of corpora in language pedagogy in line with Leech's three focuses of convergence (sections 2-4), which is foll

owed by a case study demonstrating how contrastive corpus linguistics can inform second language acquisition research (section 5). The chapter conclud


es by discussing the debate over the relevance of authenticity and frequency of corpora in language education as well as the future of corpus-based la

ause direct use of corpora in language pedagogy is restricted by a number of factors including, for example, the level and experience of learners, tim

e constraints, curricular requirements, knowledge and skills required of teachers for corpus analysis and result interpretation, and the access to res

ources such as computers, and appropriate software tools and corpora, or a combination of these (see section 6 for further discussion). This section e


xplores how corpora have impacted on language pedagogy indirectly.2.1. Reference publishingCorpora have revolutionized reference publishing (at least

onwards not to be based on corpus data, and 'even people who have never heard of a corpus a

re using the product of corpus-based investigation' (Hunston 2002: 96).Corpora are useful in several ways for lexicographers. The greatest advantage o

f using corpora in lexicography lies in their machine-readable nature, which allows dictionary makers to extract all authentic, typical examples of th


e usage of a lexical item from a large body of text in a few seconds. The second advantage of the corpusbased approach, which is not available when us

1995 and Longman 1995, include such frequency information. Frequency data plays an even more important role in the so-called frequency dictionaries,

which define core vocabulary to help learners of different modem languages, e.g. Davies (2005) for Spanish, Jones and Tschirner (2005) for German. Dav

ies and de Oliveira Preto-Bay (2007) for Portuguese, Lonsdale and Bras (2009) for French, and Xiao, Rayson and McEnery (2009) for Chinese. Information


of this sort is particularly useful for materials writers and language learners alike. A further benefit of using corpora is related to corpus markup

r and age) metadata which allows lexicographers to give a more accurate description of the usage of a lexical item. Corpus annotations such as part-of

-speech tagging and word sense disambiguation also enable a more sensible grouping of words which are polysemous and homographs. Furthermore, a monito

r corpus allows lexicographers to track subtle change in the meaning and usage of a lexical item so as to keep their dictionaries up-to-date. Last but


not least, corpus evidence can complement or refute the intuitions of5individual lexicographers, which are not always reliable (cf. Sinclair 1991a: 1

