skip to primary navigation skip to content
 

TALC: Pre-conference Workshops

Error annotation in learner corpora: tools and applications in English and Italian

Workshop leaders

Olga Vinogradova, Associate Professor at the School of Linguistics, National Research University Higher School of Economics (Moscow).

She works in the area at the crossroads of Teaching English as a Foreign Language and Corpus Linguistics, and her main research interests are Learner Corpus Research (with the focus on creating computer tools for the learner corpus), and the acquisition of English by Russian learners.

Stefania Spina, Associate Professor in Linguistics at the University for Foreigners of Perugia, Italy.

She works in the area of Applied Linguistics, and her main research interests are Learner Corpus Research (the acquisition of Italian as a second language, with a focus on phraseology) and Corpus Linguistics (mainly in the analysis of register variation and of the language used in Italian media).

Luciana Forti, a PhD candidate in Applied Linguistics at the University for Foreigners of Perugia, Italy.

She holds a BA and MA in Linguistics and Applied Linguistics, both earned at Sapienza University of Rome, Italy. Her doctoral project deals with Data-driven learning and the uses of corpora in the context of Italian as a second language learning and teaching, with a focus on verb + noun collocations

Ivan Torubarov, 3rd-year Bachelor student at the School of Linguistics, National Research University Higher School of Economics (Moscow).

His project is devoted to the automated feedback for a student essay uploaded to the learner corpus.

Nikita Login, 3rd-year Bachelor student at the School of Linguistics, National Research University Higher School of Economics (Moscow).

His project is devoted to creating a test-making tool for automated generation of testing questions on the basis of error annotations in the learner corpus.

Abstract

In the first part the participants of the workshop will get acquainted with REALEC, the collection of English essays written by Russian university students, with its error classification scheme and the main principles for annotating errors in BRAT. A short video of the text annotation will be demonstrated, followed by interactive exercises in annotation, and a short competition for the best annotation will complete this part. Then the attendants will apply a test-maker – a computer tool for generating testing questions from the errors annotated in the corpus. The participants of the workshop will get a bank of automatically generated questions, edit them, do a test, and analyse its results. The third stage will involve getting automated feedback on an uploaded text by applying REALECInspector – a tool that compares some features of this text with corresponding features of similar texts in the corpus. The participants will compare text inspection for the texts of two genres and for the texts of different writing proficiency.


In the second part of the workshop, a system for the annotation of Italian collocation errors in learner texts will be presented and discussed. The error annotation is performed on the Longitudinal Corpus of Chinese Learners of Italian (LoCCLI ). This part of the workshop will specifically focus on the following key aspects of error annotation, which are particularly challenging in the case of collocations:


- choosing target hypotheses;
- coherently assigning categories to collocation errors;
- interpreting recurring collocation errors.

The interest of the workshop lies mostly in the different perspectives from which the practice of error annotation is approached: in the first part, the annotation is aimed at integrating and improving CALL systems, and the specific uses of a learner corpus for EFL/ESL instructors are in focus in the presentation; in the second part, it is intended to detect and analyse a specific area of difficulty for learners, with the aim of providing useful data to improve our knowledge on learner behaviour. Another benefit of the workshop lies in the focus on two different second languages, which can raise interesting problems and propose new challenges for those interested in error annotation.

Will participants need to bring their own devices?

Yes, the use of laptops is encouraged. There are no requirements regarding the operating system, and there is no need to install special software, but everyone will need a modern web browser, preferably Google Chrome. No special requirements as far as the choice of Windows or Mac is concerned.

References

Abel, A., Konecny, C. & Autelli, E. (2015). Annotation and Error Analysis of Formulaic Sequences in an L2 Learner Corpus of Italian. Proceedings of LCR2015, Cuijk/Nijmegen (NL), 11-13 September 2015.


Granger, Sylviane (2003). The international corpus of learner English: a new resource for foreign language learning and teaching and second language acquisition research. In: Tesol Quarterly, 2003, 538-546.


Hovy, Eduard (2015). Corpus Annotation. In Ruslan Mitkov (Ed.) The Oxford Handbook of Computational Linguistics, 2nd edition, 2015.


Leech, Geoffrey (2015). Adding linguistic annotation. In Developing linguistic corpora : a guide to good practice. Oxbow Books, Oxford, 2015, pp. 17-29.


Lüdeling, A. & Hirschmann, H. (2015). Error annotation systems. In S. Granger, G. Gilquin & F. Meunier (Eds.),The Cambridge Handbook of Learner Corpus Research, Cambridge: Cambridge University Press, pp. 135-158.


Lyashevskaya, O., Vinogradova, O., Panteleeva, I. (2017). Multi-Level Student Essay Feedback In A Learner Corpus In: Computational Linguistics and Intellectual Technologies 16, v.1, 2017, pp. 382-396.


Vinogradova, Olga (2016). The Role and Applications of Expert Error Annotation in a Corpus of English Learner Texts. In: Computational Linguisitics and Intellectual Technologies Ussue 15 (22) International Conference “Dialogue 2016” Proceedings, pp. 740-751.