• Skip to primary navigation
  • Skip to main content

site logo
The Electronic Journal for English as a Second Language
search
  • Home
  • About TESL-EJ
  • Vols. 1-15 (1994-2012)
    • Volume 1
      • Volume 1, Number 1
      • Volume 1, Number 2
      • Volume 1, Number 3
      • Volume 1, Number 4
    • Volume 2
      • Volume 2, Number 1 — March 1996
      • Volume 2, Number 2 — September 1996
      • Volume 2, Number 3 — January 1997
      • Volume 2, Number 4 — June 1997
    • Volume 3
      • Volume 3, Number 1 — November 1997
      • Volume 3, Number 2 — March 1998
      • Volume 3, Number 3 — September 1998
      • Volume 3, Number 4 — January 1999
    • Volume 4
      • Volume 4, Number 1 — July 1999
      • Volume 4, Number 2 — November 1999
      • Volume 4, Number 3 — May 2000
      • Volume 4, Number 4 — December 2000
    • Volume 5
      • Volume 5, Number 1 — April 2001
      • Volume 5, Number 2 — September 2001
      • Volume 5, Number 3 — December 2001
      • Volume 5, Number 4 — March 2002
    • Volume 6
      • Volume 6, Number 1 — June 2002
      • Volume 6, Number 2 — September 2002
      • Volume 6, Number 3 — December 2002
      • Volume 6, Number 4 — March 2003
    • Volume 7
      • Volume 7, Number 1 — June 2003
      • Volume 7, Number 2 — September 2003
      • Volume 7, Number 3 — December 2003
      • Volume 7, Number 4 — March 2004
    • Volume 8
      • Volume 8, Number 1 — June 2004
      • Volume 8, Number 2 — September 2004
      • Volume 8, Number 3 — December 2004
      • Volume 8, Number 4 — March 2005
    • Volume 9
      • Volume 9, Number 1 — June 2005
      • Volume 9, Number 2 — September 2005
      • Volume 9, Number 3 — December 2005
      • Volume 9, Number 4 — March 2006
    • Volume 10
      • Volume 10, Number 1 — June 2006
      • Volume 10, Number 2 — September 2006
      • Volume 10, Number 3 — December 2006
      • Volume 10, Number 4 — March 2007
    • Volume 11
      • Volume 11, Number 1 — June 2007
      • Volume 11, Number 2 — September 2007
      • Volume 11, Number 3 — December 2007
      • Volume 11, Number 4 — March 2008
    • Volume 12
      • Volume 12, Number 1 — June 2008
      • Volume 12, Number 2 — September 2008
      • Volume 12, Number 3 — December 2008
      • Volume 12, Number 4 — March 2009
    • Volume 13
      • Volume 13, Number 1 — June 2009
      • Volume 13, Number 2 — September 2009
      • Volume 13, Number 3 — December 2009
      • Volume 13, Number 4 — March 2010
    • Volume 14
      • Volume 14, Number 1 — June 2010
      • Volume 14, Number 2 – September 2010
      • Volume 14, Number 3 – December 2010
      • Volume 14, Number 4 – March 2011
    • Volume 15
      • Volume 15, Number 1 — June 2011
      • Volume 15, Number 2 — September 2011
      • Volume 15, Number 3 — December 2011
      • Volume 15, Number 4 — March 2012
  • Vols. 16-Current
    • Volume 16
      • Volume 16, Number 1 — June 2012
      • Volume 16, Number 2 — September 2012
      • Volume 16, Number 3 — December 2012
      • Volume 16, Number 4 – March 2013
    • Volume 17
      • Volume 17, Number 1 – May 2013
      • Volume 17, Number 2 – August 2013
      • Volume 17, Number 3 – November 2013
      • Volume 17, Number 4 – February 2014
    • Volume 18
      • Volume 18, Number 1 – May 2014
      • Volume 18, Number 2 – August 2014
      • Volume 18, Number 3 – November 2014
      • Volume 18, Number 4 – February 2015
    • Volume 19
      • Volume 19, Number 1 – May 2015
      • Volume 19, Number 2 – August 2015
      • Volume 19, Number 3 – November 2015
      • Volume 19, Number 4 – February 2016
    • Volume 20
      • Volume 20, Number 1 – May 2016
      • Volume 20, Number 2 – August 2016
      • Volume 20, Number 3 – November 2016
      • Volume 20, Number 4 – February 2017
    • Volume 21
      • Volume 21, Number 1 – May 2017
      • Volume 21, Number 2 – August 2017
      • Volume 21, Number 3 – November 2017
      • Volume 21, Number 4 – February 2018
    • Volume 22
      • Volume 22, Number 1 – May 2018
      • Volume 22, Number 2 – August 2018
      • Volume 22, Number 3 – November 2018
      • Volume 22, Number 4 – February 2019
    • Volume 23
      • Volume 23, Number 1 – May 2019
      • Volume 23, Number 2 – August 2019
      • Volume 23, Number 3 – November 2019
      • Volume 23, Number 4 – February 2020
    • Volume 24
      • Volume 24, Number 1 – May 2020
      • Volume 24, Number 2 – August 2020
      • Volume 24, Number 3 – November 2020
      • Volume 24, Number 4 – February 2021
    • Volume 25
      • Volume 25, Number 1 – May 2021
      • Volume 25, Number 2 – August 2021
      • Volume 25, Number 3 – November 2021
      • Volume 25, Number 4 – February 2022
    • Volume 26
      • Volume 26, Number 1 – May 2022
      • Volume 26, Number 2 – August 2022
      • Volume 26, Number 3 – November 2022
      • Volume 26, Number 4 – February 2023
    • Volume 27
      • Volume 27, Number 1 – May 2023
      • Volume 27, Number 2 – August 2023
      • Volume 27, Number 3 – November 2023
      • Volume 27, Number 4 – February 2024
    • Volume 28
      • Volume 28, Number 1 – May 2024
      • Volume 28, Number 2 – August 2024
      • Volume 28, Number 3 – November 2024
      • Volume 28, Number 4 – February 2025
    • Volume 29
      • Volume 29, Number 1 – May 2025
      • Volume 29, Number 2 – August 2025
      • Volume 29, Number 3 – November 2025
      • Volume 29, Number 4 – February 2026
  • Books
  • How to Submit
    • Submission Info
    • Ethical Standards for Authors and Reviewers
    • TESL-EJ Style Sheet for Authors
    • TESL-EJ Tips for Authors
    • Book Review Policy
    • Media Review Policy
    • TESL-EJ Special issues
    • APA Style Guide
  • Editorial Board
  • Support

Criterion-referenced Language Testing

September 2003 — Volume 7, Number 2

Criterion-referenced Language Testing

James Dean Brown and Thom Hudson (2002)
Cambridge: Cambridge University Press
Pp. xvi + 320
ISBN 0521000831(paper)
$ 29.95

Criterion-referenced Language Testing, authored by James Dean Brown and Thom Hudson, asserts how criterion-referenced testing (CRT) can provide realistic and useful test development tools that will assist language teachers and language curriculum developers in their respective jobs. In fact, over the past decades, CRT, which provides information about an individual’s mastery of a given criterion domain or ability level, has become an emerging issue in language assessment, especially in language achievement tests. This book addresses the wide variety of CRT and decision-making needs that more and more language-teaching professionals must consider in real-life testing situations. Each of the seven chapters of this volume contains a discussion of the theoretical and practical parameters involved in language testing situations. This book treats CRT at a simple statistical level. Any readers who have taken an introductory statistics course will be easily acquainted with concepts that the volume presents. As the book assumes no previous technical knowledge of CRT as a mode of language testing, it provides a good introduction for laypersons to the issues surrounding language testing in general as well as CRT in particular.

To show the different phases of the CRT, the authors of Criterion-referenced Language Testing take a focused approach to the issues involved in developing, implementing, and improving language tests with relation to the criterion-referenced approach. In so doing, they explore what kinds of alternate paradigms are possible in language testing situations, what curriculum-related language testing is, what CRT items are, how basic descriptive and item statistics for CRT can be conducted and interpreted, how reliability, dependability, and unidimensionality in CRT should be addressed, how validity of CRT can be viewed, and how CRT can be administered, given feedback, and reported. [-1-]

In Chapter 1, entitled ‘Alternate paradigms,’ the authors identify the place of CRTs in language testing theory and research by examining background on what CRT is, what it can do, and how it is related to theoretical issues in language testing. There has been much research on NRT for many decades, but there has been a surge of interest in CRT for the last few decades. The authors discuss the competing paradigms that are represented by NRT and CRT. In their main exploration of the main question of what language tests are measuring, they consider following four questions: “What makes language testing special? What is language proficiency? What is communicative language ability? What problems do CRT developers face?” (p. 15). These four sub-questions under the aforementioned main question are linked to practical implications for the relationship of CRT development and implementation. The chapter ends with following four practical questions that CRT developers must face in serving the goals and objectives of CRT:

“1. How can item analysis be performed when: (a) no comparison group is designated as instructed or uninstructed group; (b) no externally identified masters and non-masters are defined; or (c) when mastery groups are defined and available?

2. How dependable are the decisions made on the basis of the test? How generalizable are the scores and analyses to those of other examinees on other forms of the test?

3. How can a standard, or cut-point, be rationally set?

4. What advantages and disadvantages accrue from application of the statistical approaches provided by NRT or CRT analyses?” (p. 27)

The subsequent chapters of this volume provide answers for these four questions that may be encountered in putting CRT into practice.

Chapter 2, entitled ‘Curriculum-related testing,’ first discusses the interrelationship between CRT and curriculum. This chapter addresses how language testing is involved in needs analysis, goals and objectives, testing, materials, teaching, and evaluation, all of which function as the components of language curriculum development. Then, by providing practical examples of both instructional and performance objectives, the chapter enumerates each and every relationship between two types of objectives and CRT that language specialists may face in their testing situation. In this chapter, the authors also value the washback effect in CRT, which is subsequently linked to the significance of multiple sources of information in language-related decision making. The remainder of this chapter is devoted to a comprehensive overview of how to adjust modes of assessment to curriculum when there is a discrepancy between language curriculum and testing practice.

Chapter 3, ‘Criterion-referenced test items,’ provides caveats in constructing test item specifications together with descriptions of what the test specification is, and how it should be created. This is followed by a practical exploration of item quality and content analysis with relation to the expected problems of our daily test use in language programs. This chapter helps to foster our capability to establish streamlined test specification in implementing CRT in a simple way. However, the discussion of test specification described in the chapter is relatively unsophisticated in that they do not address the situation when “reverse engineering” (Davidson, 2002, p. 41) is necessary for the creation of test specifications from existing test items in language test development. As Davidson argues, since not all language testing is specification-driven, further discussion of the possibility of reverse engineering might have been a valuable channel to explore some of significant related topics: critical language testing, and certain philosophical standings in the use of tests and test change with relation to language curriculum.

Chapter 4, entitled ‘Basic descriptive and item statistics for criterion-referenced tests’ covers detailed illustration of both NRT item statistics and criterion-referenced item analysis for describing and revising CRT for the intended goals and objectives. The writing style of authors of this book is so straightforward that any motivated readers may easily learn how to interpret the results from item statistics with regard to NRT and CRT just through the intensive reading of this particular chapter. It is generally, though not always, accepted that item response theory (IRT) has statistical advantage over classical test theory for calibrating new questions in constructing equivalent forms and for item banking. The authors did not miss this point and they briefly discuss the practical applications of IRT in CRT construction. They also present a basic level of multi-faceted Rasch model which “locates an examinee’s ability and an item’s difficulty estimates on a common scale” (p. 145). However, their coverage of IRT and multi-faceted Rasch model provides so limited information that readers may not map the overall features of those two topics.

This volume may be arguably viewed as a slightly revised and expanded edition of the book Testing in Language Programs (1996), published in Prentice Hall, where James Dean Brown illustrated how to address proficiency, placement, diagnostic, and achievement tests, and how to design them for both program level and classroom level decisions. Any readers who have read the previous version will not find any big difference between Testing in Language Programs and the first three and half chapters of Criterion-referenced Language Testing. [-2-]

In Chapter 5, ‘Reliability, dependability, and unidimensionality,’ the authors address the three central issues involved in test consistency, which are reliability in NRT, dependability in CRT, and fit in IRT. Starting with a review of the traditional concept of test reliability in NRT, test-retest reliability, equivalent forms reliability and internal consistency reliability, the chapter carries the discussion of threshold-loss methods and generalizability approaches to CRT dependability to highlight their importance for making decisions based on CRT scores. This chapter also sets the stage for addressing validity in the following chapter. The fact that reliability is a measure of whether a measuring device measures a construct in the same way from context to context suggests that any valid measure must first be reliable, and if measures are not reliable, they obscure the construct they measure and hence, may obstruct validity. In that sense, the combined discussion of reliability and validity serves as a synthesis of those two issues in addressing any research, including language testing.

Chapter 6 introduces ‘Validity of criterion-referenced tests’ with two perspectives: content validity and construct validity. The first includes the following two approaches: theoretical arguments and expert judgments. The second includes the following three studies: intervention studies, differential group studies, and hierarchical-structural studies. Together with examples provided in the chapter, these five strategies of validity studies were addressed to show how they can practically be applied in running and maintaining language programs. Next, both Messick’s (1988) and Cronbach’s (1988) expanded views of validity, which has led to a paradigm shift in the study of validity, are explored for the evidential and consequential bases of test interpretation and use, and the functional, political, economic, and explanatory perspectives on test validity. This chapter functions as an introduction of a unified view of validity, and an illustration of how these expanded views of validity concepts can be applied in CRT practice.

In covering standard setting, the authors also highlight some of the existing methods in the field of educational measurement and illuminate the applicability and utility of these methods to CRT in language program administration. Chapter 6 differentiates this volume from other language testing books on the market that do not even mention the concept of standard setting. Given the importance of program or school admissions, certification acquisition, personnel selection and program evaluation, a discussion of this topic seems to be an appropriate vehicle for leading readers to build on the concepts and procedures of standard setting.

After the coverage of the topics throughout the previous six chapters, however, the enduring challenge remains how CRT should be conducted, how the results should be interpreted, and how the reporting should be conducted. The answers can be found in Chapter 7, entitled ‘Administering, giving feedback, and reporting on criterion-referenced tests.’ The authors provide some practical suggestions based on their own experiences with real criterion-referenced assessment projects. The underlying logic of CRT approach is based on assessing how much of the content in a course or program is being learned by the students. Such assessment depends on comparing performance to the well-defined criteria rather than to assess student’s performance in relationship to the performances of the other students in the norm group. Such a connection of CRT to goals and objectives as particular standards or criteria is closely linked to a curriculum. Hence, this book featuring CRT should also be considered as the contribution to the enhanced notion of valuing both individual and contextual differences in pedagogical decision-making in language-related curriculum development.

To this reviewer, overall, Criterion-referenced Language Testing is a well-written book that will appeal to upper-level undergraduate and graduate students who are preparing to become second language (L2) teaching professionals, and L2 testing practitioners. This volume is also well-suited for classroom teachers, language testing researchers, and curriculum developers who are preparing to develop new perspectives, maintain language programs, or conduct research in the field of language testing from theory and practice. Providing a readable introduction to the issues surrounding CRT, this book is guiding those readers how to use criterion-referenced approach to analyze language testing data and construct systematic curriculum-related testing. Symbols and equations are both graphically and verbally well explained, and detailed examples and illustrations are salient across almost the whole chapters. With its clear examples, Criterion-referenced Language Testing not only provides an applied introduction to any language testing course but also is a valuable reference from undergraduate students to language testing professionals.

References

Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice-Hall.

Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H. I. Braun (Eds.) Test validity (pp. 3-17). Hillsdale, NJ: Lawrence Erlbaum Associates.

Davidson, F. & Lynch, B. K. (2002). Testcraft: A teacher’s guide to writing and using language test specifications. New Haven, CT: Yale University Press.

Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In H. Wainer & H. I. Braun (Eds.) Test validity (pp. 33-45). Hillsdale, NJ: Lawrence Erlbaum Associates.

Hyeong-Jong Lee
University of Illinois at Urbana-Champaign
<hlee26@uiuc.edu>

© Copyright rests with authors. Please cite TESL-EJ appropriately.

Editor’s Note: Dashed numbers in square brackets indicate the end of each page for purposes of citation..

[-3-]

© 1994–2026 TESL-EJ, ISSN 1072-4303
Copyright of articles rests with the authors.