Do you use corpora?

If you think you don’t use corpora in your planning and teaching, think again! Because we teachers actually do use corpora all the time, perhaps unknowingly, for example whenever we google a word or an expression, looking for examples of its use, or when we go through texts online to find the one with the required linguistic elements for our class.

What is a corpus?

Corpus (plural corpora), the Latin word for body, has come to mean, in corpus linguistics, a “collection of electronic, naturally-occurring texts (written or spoken) which are selected to be representative of a particular language or language variety” (McEnery et al, 2006, p. 5). Corpora usually include millions, if not billions of words, not only as used by native speakers but also by L2 speakers. They can provide evidence of language use in different genres and registers and reveal patterns in language.

Accessing all these becomes feasible with the use of corpus software tools. For example, have a look at the three figures below, which illustrate how I used Sketch Engine to produce frequency lists for the word teacher (figures 1 and 2) and to create a concordance for the same word, based on data from the British National Corpus (BNC).

Figure 1. Visualisation of a frequency list for ‘teacher’ generated by Sketch Engine
Figure 2. Different view of the same search of a frequency list for  ᾽teacher᾽ generated by Sketch Engine.
Figure 3. List of concordance lines for the lemma ᾽teacher´ generated by Sketch Engine. 
KWIC stands for keyword in context

Frequency of linguistic items can inform curricular decisions about the order these items should be taught in, materials to be used, as well as test development. Concordances, i.e. all the instances of a linguistic item listed in their immediate context, are not to be confused with collocations, ie.the way individual lexical items are used together regularly. Collocations can be traced in concordance lines, along with other information on language use in context which can be utilised in teaching.

Why is it a good idea to use corpora?

So, corpora can provide a wealth of information about authentic language use of different registers and genres, and corpus software tools can help explore them. But why should teachers use corpora? Most importantly, why should learners use them?

Teachers can use corpora to enhance their research skills and develop their language awareness to inform their teaching. Quite often teachers need to adapt or supplement published materials which cover to a limited extent desirable linguistic features, or in which natural speech instances are not presented, or need to be presented using a different approach (Walsh, 2010). Teachers can source instances of natural language from corpora to provide tailor-made activities to cater to specific learners’ needs considering level, context, system and skill. They can modify corpora or use them as is to demonstrate a language pattern in context or ask learners to notice features of a function or compare data from native and L2 corpora for error correction. Imagination is the limit. 

The use of language corpora as a resource holds significant benefits for the learners on many levels as well. Cheng (2010) states that it “has been shown to contribute to the acquisition of both implicit and explicit knowledge” (p.320). It can support exploratory and discovery learning, which is motivating for the learners and as they engage more actively probably retain knowledge for longer. Autonomous learning is encouraged and since learners acquire or practice essential learning skills such as noticing, making inferences, and reflecting, they can transfer all these to other fields of study (Gilquin & Granger, 2010).

The cognitive, pedagogical, and motivational benefits of the use of corpora in language learning in the form of Data-Driven Learning (DDL) have been discussed repeatedly in presentations in conferences of local and national TESOL associations, and the annual IATEFL conferences over the years. ELT Blogs often feature corpora and there is even a dedicated FB group promoting the use of corpora in language teaching. Why is it then that teachers are reluctant to apply DDL and mostly view the use of corpora as a tool limited to lexicographers, academics, researchers, material developers and test designers? 

Challenges associated with using corpora

McCarthy and O’Keeffe (2010) point out that teachers’ attitudes towards DDL are not favourable as “it turns the traditional order within the classroom upon its head. The corpus becomes the centre of knowledge, the students take on the role of questioner and the teacher is challenged to hand over control and facilitate learning” (p. 7). Attitudes aside, teachers’ limited time and workload, lack of computer skills and training on how to access and use corpora are major issues. There is also the cost to access corpora since many of them are not freely available. To that, schools will need to add the cost of the investment in computers, training of teachers and students, and software tools. And with empirical data on the effectiveness of language learning still limited it is not even a guaranteed investment for them to make (Gilquin & Granger, 2010)! Apart from teachers’ attitudes and logistics, learners’ attitudes towards DDL are an additional reason why interest in DDL is still limited. There are challenges for the learners, especially when little training is given, or they do not have appropriate proficiency level and technological knowledge (Soruç & Tekin, 2017). 

To address the challenges learners experience, time and effort need to be invvested in training in using corpus tools, reading and interpreting corpus results, and interpreting quantitative results may be needed (Sripicharn, 2010). The step preceding this, though, should be the training of the teachers. Corpus linguistics is more common nowadays in initial language teacher education but besides basic knowledge, development of skills is required in a way that would promote active engagement with corpora. 


Should we then abandon the use of corpora in non-academic teaching contexts? In my humble opinion, no. It is a tool that could contribute very positively not only to language learning but to teacher development and most importantly to learner autonomy in a way very fitting to today’s COVID-19 pandemic online teaching and learning challenges. Forced into a world which requires learners to be technologically adept and more independent provides the perfect opportunity to explore the world of DDL especially when more corpora can freely be accessed nowadays, and a quick online search produces results for corpus software tools that are not just free but also user friendly. 

A few corpus resources

For those interested in exploring corpora and their use further, some indicative sources follow:

General reads in corpus linguistics 


  • British National Corpus (BNC)
  • Corpus of Contemporary American English (COCA)
  • The International corpus of learner English (ICLE) (via CQPWeb)
  • Longman Learner Corpus (via CQPWeb)
  • Cambridge Learner Corpus (not freely available)

Classroom resources based on corpora 

Corpus software tools


