skip to primary navigation skip to content

Alex Boulton

Researching data-driven learning: Past, present, future

Alex Boulton is Professor of English and Applied Linguistics at the University of Lorraine and director of the ATILF (UMR 7118: CNRS & UL). Particular research interests focus on corpus linguistics and potential uses for ‘ordinary’ teachers and learners (data-driven learning). He has published and edited books and papers in these fields over the years, and is on various boards and committees: AFLA (vice-president), EUROCALL and TaLC; as well as journals such as ReCALL (editor), Alsic, ASp, CALL-EJ, Eurocall Review, IJCALLT, JALT-CALL Journal, Language Learning & Technology, and Al-Lisaniyyat.

Prof Alex Boulton


Corpus tools and techniques have been used for pedagogical purpose for around 50 years (McEnery & Wilson, 1997). The first academic publications in the area appeared in the 1980s (e.g. McKay, 1980), though the concept is largely associated with work by Tim Johns who published a string of papers on the topic before and after he coined the term ‘data-driven learning’ (DDL) in 1990. His work coincided with a new generation of corpus building and analysis, not least the COBUILD project in Birmingham where Johns was based (e.g. Sinclair, 1991). The TaLC (Teaching and Language Corpora) conferences were inaugurated at Lancaster University in 1994 and have been conducted every two years since then, each conference accompanied by the publication of selected papers. There have now been several hundred papers, books, chapters, proceedings papers and PhD theses in this area. Given that, it is legitimate to wonder how to go about making sense of results in the field, and indeed what DDL actually looks like today. These are the two issues addressed in this presentation.

For the first, Chambers provided the first genuine attempt to summarise some of the empirical research in 2007; most such narrative syntheses since then are open to charges of presenting partial coverage of the field and subjective interpretation of the evidence. Systematic trawls find that the body of empirical research in DDL now amounts to more than 300 individual publications; characteristics can be derived from coding them in categories such as publication type and date, design and analysis, countries and contexts, learner levels and needs, tools and corpora, target language and uses, and so on. The abstracts can also be analysed using corpus tools for further insights. A second type of synthesis is the meta-analysis such as that conducted by Boulton and Cobb (2017). While again only providing partial coverage (i.e. quantitative results only), this is at least systematic in collecting research corresponding to stated inclusion criteria, and in the analysis itself. The results briefly sketched out are highly encouraging – but how then are we to justify the “seeming mismatch between utility and uptake” (Ballance, 2007)? We therefore focus in on less successful instantiations of DDL in an attempt to see if they have anything in common, with possible suggestions for future good practice.

For the second, one crucial development is clearly the existence of many fast, efficient, user-friendly corpora and tools, some designed specifically with language learners in mind, as well as the ease of creating one’s own resources. Further, users today are far more familiar with computer tools, and ‘digital natives’ regularly use search engines for language queries on the web via computers or mobile devices. Nonetheless, a look at examples given in the literature from the early days as compared to recent publications shows that many DDL practices have remained broadly similar over the decades. The question then is: has DDL reached a stable state, or is there room for substantial evolution in the future? This is highly personal question (see e.g. Tribble, 2015) and a genuinely open one; audience suggestions are warmly welcomed.