• Skip to primary navigation
  • Skip to main content

site logo
The Electronic Journal for English as a Second Language
search
  • Home
  • About TESL-EJ
  • Vols. 1-15 (1994-2012)
    • Volume 1
      • Volume 1, Number 1
      • Volume 1, Number 2
      • Volume 1, Number 3
      • Volume 1, Number 4
    • Volume 2
      • Volume 2, Number 1 — March 1996
      • Volume 2, Number 2 — September 1996
      • Volume 2, Number 3 — January 1997
      • Volume 2, Number 4 — June 1997
    • Volume 3
      • Volume 3, Number 1 — November 1997
      • Volume 3, Number 2 — March 1998
      • Volume 3, Number 3 — September 1998
      • Volume 3, Number 4 — January 1999
    • Volume 4
      • Volume 4, Number 1 — July 1999
      • Volume 4, Number 2 — November 1999
      • Volume 4, Number 3 — May 2000
      • Volume 4, Number 4 — December 2000
    • Volume 5
      • Volume 5, Number 1 — April 2001
      • Volume 5, Number 2 — September 2001
      • Volume 5, Number 3 — December 2001
      • Volume 5, Number 4 — March 2002
    • Volume 6
      • Volume 6, Number 1 — June 2002
      • Volume 6, Number 2 — September 2002
      • Volume 6, Number 3 — December 2002
      • Volume 6, Number 4 — March 2003
    • Volume 7
      • Volume 7, Number 1 — June 2003
      • Volume 7, Number 2 — September 2003
      • Volume 7, Number 3 — December 2003
      • Volume 7, Number 4 — March 2004
    • Volume 8
      • Volume 8, Number 1 — June 2004
      • Volume 8, Number 2 — September 2004
      • Volume 8, Number 3 — December 2004
      • Volume 8, Number 4 — March 2005
    • Volume 9
      • Volume 9, Number 1 — June 2005
      • Volume 9, Number 2 — September 2005
      • Volume 9, Number 3 — December 2005
      • Volume 9, Number 4 — March 2006
    • Volume 10
      • Volume 10, Number 1 — June 2006
      • Volume 10, Number 2 — September 2006
      • Volume 10, Number 3 — December 2006
      • Volume 10, Number 4 — March 2007
    • Volume 11
      • Volume 11, Number 1 — June 2007
      • Volume 11, Number 2 — September 2007
      • Volume 11, Number 3 — December 2007
      • Volume 11, Number 4 — March 2008
    • Volume 12
      • Volume 12, Number 1 — June 2008
      • Volume 12, Number 2 — September 2008
      • Volume 12, Number 3 — December 2008
      • Volume 12, Number 4 — March 2009
    • Volume 13
      • Volume 13, Number 1 — June 2009
      • Volume 13, Number 2 — September 2009
      • Volume 13, Number 3 — December 2009
      • Volume 13, Number 4 — March 2010
    • Volume 14
      • Volume 14, Number 1 — June 2010
      • Volume 14, Number 2 – September 2010
      • Volume 14, Number 3 – December 2010
      • Volume 14, Number 4 – March 2011
    • Volume 15
      • Volume 15, Number 1 — June 2011
      • Volume 15, Number 2 — September 2011
      • Volume 15, Number 3 — December 2011
      • Volume 15, Number 4 — March 2012
  • Vols. 16-Current
    • Volume 16
      • Volume 16, Number 1 — June 2012
      • Volume 16, Number 2 — September 2012
      • Volume 16, Number 3 — December 2012
      • Volume 16, Number 4 – March 2013
    • Volume 17
      • Volume 17, Number 1 – May 2013
      • Volume 17, Number 2 – August 2013
      • Volume 17, Number 3 – November 2013
      • Volume 17, Number 4 – February 2014
    • Volume 18
      • Volume 18, Number 1 – May 2014
      • Volume 18, Number 2 – August 2014
      • Volume 18, Number 3 – November 2014
      • Volume 18, Number 4 – February 2015
    • Volume 19
      • Volume 19, Number 1 – May 2015
      • Volume 19, Number 2 – August 2015
      • Volume 19, Number 3 – November 2015
      • Volume 19, Number 4 – February 2016
    • Volume 20
      • Volume 20, Number 1 – May 2016
      • Volume 20, Number 2 – August 2016
      • Volume 20, Number 3 – November 2016
      • Volume 20, Number 4 – February 2017
    • Volume 21
      • Volume 21, Number 1 – May 2017
      • Volume 21, Number 2 – August 2017
      • Volume 21, Number 3 – November 2017
      • Volume 21, Number 4 – February 2018
    • Volume 22
      • Volume 22, Number 1 – May 2018
      • Volume 22, Number 2 – August 2018
      • Volume 22, Number 3 – November 2018
      • Volume 22, Number 4 – February 2019
    • Volume 23
      • Volume 23, Number 1 – May 2019
      • Volume 23, Number 2 – August 2019
      • Volume 23, Number 3 – November 2019
      • Volume 23, Number 4 – February 2020
    • Volume 24
      • Volume 24, Number 1 – May 2020
      • Volume 24, Number 2 – August 2020
      • Volume 24, Number 3 – November 2020
      • Volume 24, Number 4 – February 2021
    • Volume 25
      • Volume 25, Number 1 – May 2021
      • Volume 25, Number 2 – August 2021
      • Volume 25, Number 3 – November 2021
      • Volume 25, Number 4 – February 2022
    • Volume 26
      • Volume 26, Number 1 – May 2022
      • Volume 26, Number 2 – August 2022
      • Volume 26, Number 3 – November 2022
  • Books
  • How to Submit
    • Submission Procedures
    • Ethical Standards for Authors and Reviewers
    • TESL-EJ Style Sheet for Authors
    • TESL-EJ Tips for Authors
    • Book Review Policy
    • Media Review Policy
    • APA Style Guide
  • TESL-EJ Editorial Board

Corpora for University Language Teachers

September 2009 — Volume 13, Number 2

Corpora for University Language Teachers

Author: Carol Taylor Torsello, Katherine Ackerley & Erik Castello, Eds. (2008)  
Publisher: Bern: Peter Lang
Pages ISBN Price
Pp. 309 978-3-03911-639-3 (paper) $80.95 U.S.

This collection of articles by researchers from nine Italian universities provides a useful overview of current approaches to using English corpora: searchable electronic collections of prose. The volume ended up, in effect, as a festschrift for Birmingham University linguist John Sinclair, who, before his death in 2007, was to have been a keynote speaker at the conference in Padua from which these papers were drawn. An introductory piece by Guy Aston not only recalls Sinclair and his influence—“John changed our view of the lexical item” (p. 17)—but also provides a good brief history of British English corpora, detailing in particular the relationship between the COBUILD reference books, the corpus they were based on, and the subsequent evolution of that corpus into the Bank of English; this intro chapter also traces the later emergence of a competitor, the British National Corpus (BNC).

Although several recent collections have informatively discussed corpus work and language teaching (e.g., Sinclair, 2004), the current volume stands apart from other conference paper collections that simply report on a themed set of individual research projects. Since several of these papers were based on workshops, the book contains chapters that offer readers instructions on how to apply existing tools to their own corpus projects and language lessons. A later Aston essay, for example, reviews the new edition of the BNC, comparing its current texts to earlier versions and introducing the reader to XAIRA software for searching the XML tags used to code prose, thus allowing users to sort material by text variables such as genre, author, and date of composition. The first half of this article is an accessible introduction to the components of the BNC, whereas the second half assumes some experience with different query formulas. “The BNC,” Aston observes, “is a prolific resource… learners [and, I would add, teachers] need to be trained to use it—to recognize and formulate problems, pose queries and interpret solutions” (p. 235).

While several chapters rely on results found in large general-language corpora like the BNC and the Bank of English, it is smaller, custom-made corpora that are discussed here most often. With much current ESL writing and vocabulary instruction emphasizing exposing students to specialized text types to help them gain mastery of the genres of their discipline, creating these Language for Special Purposes (LSP) corpora is well motivated. Some of the specialized corpora discussed in the book include the Padova Learner Debate Corpus (PLDC), which comprises computer forum posts by language learners engaged in debates (Dalziel & Helm). A set of four other corpora (Ulrych & Murphy) was gathered following the framework of mediated discourse analysis, (Scollon, 2001) to emphasize how monolingual texts as well as translated texts reveal editorial and social influences: (1) EuroParl, formal oral discourse from European parliamentary debates; (2) AbCoR, annual reports from multinational companies; (3) AMC, American movie transcripts and their dubbed Italian versions; and (4) EuroCom, essays, half of which were written by non-native English speakers working at the European Commission, the other half being versions of the same texts edited by native English speakers working as translators.

Focusing on another LSP corpus, Tognini Bonelli analyzes terms specific to economics writing in a dataset from The Economist. And Taylor compares speech features of the artificial exchanges found in the genre film and television transcripts to the use and distribution of the same speech features in exchanges within the Bank of English. Pushing the definition of textual corpora beyond written and spoken forms, Baldry explores how concordancing can make use of multi-modal material, which can be indexed in ways that help students reinforce their text-based language learning. For example, such corpora can be sorted by images or themes, aligning film clips and the metatext that explicates them, or linking web videos with thematically connected vocabulary items.

Focusing on the writing of language learners themselves, Castello created a corpus of  25 essays from both American and British ESL proficiency exams. These learner essays were gathered to measure features of textual complexity. In other work examining writing in a non-native language, D’Angelo created CADIS, the Corpus of Academic Discourse, to capture and compare the English of academic journal articles. That corpus allows the works to be sorted by both discipline as well as the native languages of the authors (English, Italian, or other first languages). Other chapters discuss not just the compiling of texts into a corpus, but using tags to annotate more specialized corpora: Prat Zagrebelsky discusses projects using tags to code common errors in language learners’ college essays. In another tagging endeavor, not student-based, Brunetti discusses creating XML tags to show the inflectional and syntactic relations of each lexical item in a corpus of Old English poems, as well as in its Italian gloss.

As with Brunetti’s chapter, some of the essays cover projects relevant for language-related curriculums for native-speaker students as well as for English language learners, though most papers specifically focus on foreign language teaching and learning. For teachers planning to mine the results of this volume to model or help their students acquire individual English lexical items—to see, for example, how learners’ choices of modals compare to the edited usage of native speakers; which verbs most typically appear adjacent to the noun survey; or the different distributions of fork out vs. pay—it is important to keep in mind that the book’s contributors work mainly with British rather than North American varieties of English. American language practitioners who create or have created their own specialized corpus but seek a larger reference corpus of American phraseology should see Davies (2008), the Corpus of Contemporary American English (COCA), accessible on the web. However, as models of techniques for compiling a corpus based on specialized texts, and of tagging, concordancing, and searching for words that typically appear together in particular genres, these papers provide helpful guidelines for language teachers in any locale. These corpus creators successfully show how to bring to students’ attention patterns of usage found in disciplines ranging from movie transcripts and criticism to economics and news reporting, as well as in more traditional classroom text types such as poetry and academic essays. While several pieces are geared towards the comparison tasks of translators, all the chapters should prove especially relevant for those L2 classroom projects and assignments that value capturing real life constructions over grammar book examples.

References

Davies, M. (2008- ). The corpus of contemporary American English (COCA): 385 million words, 1990-present. Available online at http://www.americancorpus.org.

Sinclair, J. M. (Ed.). (2004). How to use corpora in language teaching. Amsterdam: John Benjamins.

Scollon, R. (2001). Mediated discourse: The nexus of practice. London: Routledge.

Laurel Smith Stvan
The University of Texas at Arlington
<stvanuta.edu>

© Copyright rests with authors. Please cite TESL-EJ appropriately.

Editor’s Note: The HTML version contains no page numbers. Please use the PDF version of this article for citations.

© 1994–2023 TESL-EJ, ISSN 1072-4303
Copyright of articles rests with the authors.