Vol. 7. No. 1 A-1 June 2003
Return to Table of Contents Return to Main Page

Differential Effects of Reading and Memorization of Paired Associates on Vocabulary Acquisition in Adult Learners of English as a Second Language

Frank Hermann

Indiana University of Pennsylvania


This study investigates differential effects of reading and paired-associate learning on vocabulary acquisition in adult ESL learners. The sample (N = 34) comprised two intact groups of university students. Subjects in one group were asked to read the novel Animal Farm while subjects in the comparison group memorized a list of words preselected from the novel. Subjects were then administered two post-tests: one to assess initial lexical acquisition, and another three weeks later to assess lexical retention. Only subjects in the paired-associates group were apprised of the vocabulary test. Although multiple comparisons failed to produce sufficient support for the first hypothesis (that the reading condition would initially acquire more vocabulary than the word list condition), substantial confirmation emerged for the second hypothesis--that the reading condition would exhibit superior retention rates. These findings suggest that for the purpose of encouraging long-term lexical retention, reading literature is preferable to paired-associate learning. [-1-]


Experience teaches us, and research abundantly confirms, the indispensable role that words play in human communication. Without words, language for us would be reduced to a mere discourse of iconic gestures and symbols. This dependency on the lexicon requires that even a novice communicator in a language amass a repertoire of thousands of words. Estimates indicate that the average adult native speaker of English possesses knowledge of from 17,000 (Goulden, Nation, & Read, 1990) to over 40,000 (Nagy & Herman, 1987) base words. It is further estimated that an L2 learner of English must acquire a minimum of 3,000 base words to be capable of reading unsimplified text with some degree of comprehension (Nation, 1990).

Since building and maintaining a large word stock is such an essential part of achieving proficiency in a language, especially for those pursuing academic goals, many researchers in recent years have turned their attention to vocabulary studies, a domain that seems to have become something of a cottage industry within the field of SLA. Incidental vocabulary acquisition in particular has become an especially popular subject, as demonstrated by the recent appearance of an entire issue devoted to the topic in the journal Studies in Second Language Acquisition (vol. 21, 1999).

Although incidental vocabulary acquisition was once invoked primarily as a so-called "default argument" (Jenkins & Dixon, 1983), an ample body of evidence now exists confirming the reality of incidental language learning and demonstrating its utility as a lexical acquisition strategy (Elley, 1989; Joe, 1998; Paribakht & Wesche, 1999; Wode, 1999). The pertinent question for theoretical researchers to ask now is not whether incidental vocabulary learning occurs, but the extent to which it occurs and the variables that promote its occurrence.

Of greater relevance to language educators is the practical question of how methods designed to stimulate incidental vocabulary learning fare in comparison to more direct approaches such as semantic mapping (Hague, 1987), the keyword method (Hall, 1988), and paired-associate lists (Anderson & Jordan, 1928; Tinkham, 1989). Although research over the past few decades has validated incidental learning as a legitimate phenomenon, little evidence has been adduced demonstrating its merits relative to other forms of instruction. Data of this nature should prove especially useful in illuminating the debate between those on one end of the spectrum who encourage a greater, sometimes almost exclusive, reliance on holistic pathways to promote vocabulary acquisition (e.g., Krashen, 1989, 1993) and others (e.g., Horst, Cobb, & Meara, 1998) who call for more direct, explicit instruction of high-frequency vocabulary words. Also of interest would be additional studies in the vein of Joe (1998) that investigate hybrid approaches wherein vocabulary is taught through reading in conjunction with pre- or post-reading activities such as summary or root recognition.

The present study was undertaken to furnish data for this debate via a quasi-experimental design and to measure the differential effects of two methods of lexical acquisition: (1) reading of literature and (2) rote memorization of paired associates. More specifically, the objective of this study was to determine the extent to which reading literature, as compared to the rote memorization of paired associates, facilitates lexical acquisition and retention in adult nonnative speakers of English. In accordance with prior research findings, the following hypotheses were posed:

Hypothesis 1: Subjects who participate in the reading of literature with a focus on meaning will acquire more vocabulary words than subjects who intentionally attempt to acquire the same words via memorization of paired associates. [-2-]

Hypothesis 2: Subjects who acquire vocabulary incidentally through reading literature will exhibit higher retention rates [1] than subjects who learn the same material through rote memorization.

The inspiration for this study and the fundamentals of its design are rooted in two previous studies, one being the renowned Clockwork Orange experiment (Saragi, Nation, & Meister, 1978).

In their unprecedented experiment, Saragi and colleagues had 20 native speakers of English read the novel A Clockwork Orange. This particular novel was chosen because of the numerous nadsat (Russian slang) words it contains. The subjects were informed they would be given a test of comprehension and literary criticism after completing the novel, but were not notified they would be tested over vocabulary. Several days after completing the novel, subjects took a multiple-choice vocabulary test containing 90 nadsat words. Test results revealed that subjects had made significant gains in vocabulary, even though they had not focused explicitly on learning vocabulary. In addition, the researchers found a significant correlation between the frequency of a word's occurrence in the novel and the likelihood of subjects correctly identifying the meaning of the word.

Inspired by the Clockwork Orange study, Ferris, Kiyochi, and Kowal (1988) attempted to mirror the experiment using a different text and to extend its findings to a population of L2 speakers of English. Their study involved having one group read the novel Animal Farm while an equivalent group from the same population served as a control. A pretest and post-test were administered to measure the effects of the treatment. Scores on a between-groups t test revealed that the subjects who had read the novel acquired significantly more words than the control subjects. The researchers also performed a multiple regression analysis using questionnaire data and found that subjects in the experimental group who consulted their dictionaries more frequently tended to score lower on the vocabulary test. Moreover, their results corroborate those of Saragi et al. (1978) by demonstrating a significant positive correlation between the frequency of a word's occurrence and its probability of being acquired.

Though building on the basic design of the Ferris et al. (1988) experiment, the present study differs from it in several principal ways. First, the instrument used in Ferris et al. was exclusively multiple choice and consequently could test only one facet of the subjects' vocabulary learning capacity: their receptive skills. This study, however, contains an instrument that, in addition to testing subjects' receptive skills with discrete-point assessments, measures their productive skills with a fill-in-the-blank task designed to elicit a paraphrase of each target item.

Perhaps the most salient way in which this study differs from its predecessor is with regard to the independent variable it measures. In Ferris et al. the control group was administered no treatment or instructions, which means essentially that the dependent variable reflected the value of reading as a means of lexical acquisition vis-à-vis the value of no method at all. In the present study, however, subjects in the comparison group did implement a type of method: the traditional method of paired-associate vocabulary learning. [-3-]

Input for the experimental group consisted of the entire text of Animal Farm. For the comparison group, presentation involved a vocabulary list of paired associates preselected from the novel. Two dependent variables were recognized in this study: lexical acquisition and lexical retention. Lexical acquisition is defined as the median number of words gained, by a group or an individual subject, from pretest to post-test 1. Similarly, lexical retention is defined operationally as the median number of words gained from pretest to post-test 2.



Subjects in this study (N = 34) were enrolled in a freshman-level ESL composition course at a North American university. All subjects were fully matriculated students pursuing academic degrees. By virtue of their placement scores, they were judged to have writing abilities almost comparable to those of native English speakers enrolled at the freshman level.

The language background of subjects in this study was quite heterogeneous, with a total of 13 native languages represented (see Table 1). Because of this diversity, an attempt was made to control statistically for the possible effects of L1 influence, as transfer at the lexical level can be influenced by the degree of semantic relatedness between words in the L1 and L2 (Ijaz, 1986). Since all the target words in this study were derived from languages within the Indo-European family, it was assumed that if positive transfer were to occur, it would most likely occur among subjects whose mother tongue was Indo-European. A multiple regression analysis, however, revealed no significant relationship between subjects' native language family and their total vocabulary gains as measured from pretest to post-test 2: F (2, 31) = .73, p ≤.49. This lack of relationship in no way rules out the possibility of lexical transfer, but it does suggest that language prototype was not a contributing factor in any transfer that might have occurred.

Table 1
Native Language Distribution


Non Indo-European



Word List



Word List
















































A pretest was administered to both groups to serve as a baseline and to control for inter-group differences in vocabulary proficiency prior to administration of the treatment. A comparison of medians failed to reveal a significant difference, χ2 = .47, df = 1, p ≤ .49.


This study consists of a two-group, pre/post-test quasi-experimental design. Since random sampling was not possible, four intact classes were chosen to serve as samples. All classes met three hours per week. To minimize the chances of inter-subject communication between groups, Tuesday/Thursday classes were chosen to function as the experimental group (n = 17), and Monday/Wednesday/Friday sections (n = 17) were selected to function as the comparison. Treatment in both groups was incorporated as part of the regular course curriculum for the respective sections; participation was therefore mandatory. However, due to low attendance on test days, several subjects were dropped from the study. Since repeated measures were performed, only data from subjects who took all three administrations of the vocabulary test were used. The n-sizes reflect the final status of both groups after all drops had occurred.


The instrument used to assess lexical acquisition and retention was a 35-item vocabulary test consisting of target words selected by the researcher from the text of Animal Farm (Appendix A). Only terms that the researcher believed would be unfamiliar to most subjects were chosen. The process of word selection was subjective in that the researcher relied on previous experience teaching the target population rather than a frequency list to determine the appropriateness of target words. Although a few of the words selected were colloquial or anachronistic, the majority of the terms were relatively formal and contemporary.

The test consisted of two components: a 15-item fill-in-the-blank section (worth 90 points) and a 20-item multiple-choice section (worth 20 points). These components were included to measure possible differences between subjects' receptive and productive vocabularies, as well as to produce a well-rounded and versatile instrument that would accommodate subjects' different cognitive styles. The fill-in-the-blank task (essentially a modified cloze test) was designed specifically to elicit periphrastic definitions of selected target words, yet in a manner that severely restricted the range of acceptable definitions subjects could provide. This was accomplished mainly by embedding the target word of each item within a collocational context. For instance, the question

To feel morose is to feel ____________________ (item 2)

eliminated the possibility of a subject correctly responding with an answer such as "an emotion"--a vague and general response, but one that would not be inappropriate were the target word presented in isolation.

In addition to restricting the number of possible definitions, the sentential prompt also had the advantage of providing subjects with clues about the part of speech and the semantic domain of each target word. For example, the prompt

An eminent citizen is one who is ________________ (item 14)

reveals not only that the target word is an adjective, but also that it can modify nouns that are [+ animate]. Such contextual and grammatical clues were designed to reduce the occurrence of lexical ambiguity and to minimize the possibility of subjects responding to questions incorrectly due to a misunderstanding of the cue rather than to a lack of knowledge. [-5-]

Each prompt in the multiple-choice component was accompanied by five possible choices labeled A through E, with choice E consistently reading "none of the above." [2] One correct answer and four distracters were supplied for each item. Answers were ordered randomly to prevent an organized distribution pattern from emerging.

The instrument used to prepare subjects in the comparison group for the vocabulary test was a 50-item word list (Appendix D). The list contained all 35 items that appeared on the vocabulary test, plus an additional 15 terms that the researcher--who was also the instructor of one of the sections--felt would benefit the subjects in their academic careers. Patterned after the traditional paired-associates list commonly found in many foreign language textbooks and standardized test preparation guides, the study list consisted of the target words paired with their respective definitions, most of which were taken from either the Random House Dictionary (1980) or Webster's New World Dictionary of the American Language (2nd college edition, 1974). For polysemic words, the researcher was careful to choose definitions that best reflected the meanings of the words as they appeared in the context of Animal Farm.

An additional instrument used in this study was a questionnaire (Appendix E) designed to identify possible intervening variables among subjects in the experimental group. Since the items on the questionnaire were intended as predictor variables, the options were presented in the form of a Likert-type scale to allow for their interpretation as ordinal values. The responses to each question were therefore patterned in such a way as to render them easily translatable into a 4-point scale ranging from zero to three. The instrument was designed specifically to measure the following reported criteria: (1) amount of the novel read; (2) enjoyment of the novel; (3) understanding of the novel; (4) time spent reading the novel; and (5) frequency of dictionary consultation. Question 1 was not included in the analysis but was used to screen out subjects who had not read sufficient quantities of the novel to participate legitimately in the study. Only data from participants who reported having read all or most of the novel were included in the final analyses.


After completing the pretest, subjects were instructed to begin their designated assignments. For the comparison group, the assignment was to memorize a word list of paired-associates taken from Animal Farm. Subjects were unaware of the origin of the target words. Upon distribution of the list, subjects were informed that they would be tested on the words the following week. Subjects were not instructed to use any particular technique (e.g., the keyword method) but were merely told to commit the words to memory.

Concomitantly, the experimental group was instructed to begin reading Animal Farm. Subjects in this group were instructed to focus on the novel's literary and rhetorical meaning and were informed they would be tested over major themes in the novel and their rhetorical development the following week. No instructions were given concerning vocabulary in the novel, and at no time were subjects told to expect a vocabulary test. [-6-]

Approximately one week later, the first post-test was administered to both groups to assess initial gains in vocabulary. To avoid biasing subjects, information concerning the nature and purpose of the study was not supplied. Subjects in both groups were told merely that they would receive the results of the test within several weeks. At this time subjects in the experimental group completed the questionnaire and, as part of the course requirement, attended a one-hour lecture on rhetorical modes in Animal Farm. To preserve validity, the lecturer was careful to avoid using terminology that appeared on the test. Subjects in the comparison group did not attend the Animal Farm lecture since they had not been required to read the novel.

Three weeks from the initial post-test (more precisely, 23 days for the reading condition and 21 days for the paired-associates condition), both groups were tested again, this time to assess lexical retention rates over a more extended period. The time interval was determined on practical rather than theoretical grounds. After subjects completed the second post-test, all data pertaining to the experiment were collected, scored, and analyzed. In accordance with the university's IRB protocol, subjects were notified at this time as to the purpose of the study and the function of the tests and questionnaire. Participants were assured that their identity would remain anonymous.

Due to the variety of tasks involved on the vocabulary test, different criteria were needed to score the two components. Because of the wide range of possible responses, fill-in-the-blank items were scored on a multiple point scale by several raters, all of whom were native English speakers trained in teaching ESL. To enhance interrater reliability (Table 2), all raters underwent a brief normalizing session before grading the tests. Two raters separately assigned each periphrastic answer a score ranging from 0 to 3, depending on the accuracy of the response (see Appendix C for the scoring rubric). For items with less than a two-point interrater score discrepancy, the scores from the two initial raters were combined to produce a total possible score ranging from 0 to 6 per item. The possible range of scores for the entire fill-in-the-blank section was 0 to 90.

Table 2

Interrater Reliability *




Test 1

Test 2





Word List




* Coefficients do not reflect correlations for items left blank. A total correlation for scores of all items would be noticeably higher, as a significant number of items on all three tests were left blank and thus assigned zeros by both raters.


A third rater served as arbitrator for items with greater discrepancies. In such instances, the total score was obtained by combining the arbiter's score with the nearer of the other two scores. The arbiter's score was doubled in cases where it appeared equidistant from the scores of the other raters. Because of the high degree of agreement between scores of the two initial raters, the arbitrator was needed for only 4% of the items.


To test Hypotheses 1 and 2, descriptive statistics were computed for each group across all three administrations of the vocabulary test (Table 3). Nonparametric tests [3] were then performed, both within and between groups, to determine (a) the extent to which group performance changed over time and (b) the degree to which either group differed relative to the other at each administration of the test.

Table 3

Descriptive Statistics







Test 1

Test 2

Pretest to Test 1

Test 1 to Test 2

Pretest to Test 2
























Word List






















*Possible range of test scores = 0 to 110

A post facto Spearman-rho analysis was performed using the questionnaire data obtained from the experimental group. The responses to questions 2-5 were converted into a four-point ordinal scale and correlated with scores from the second post-test. No higher-order interactions were tested for due to the low N-size (17) and coefficient values. [-8-]


A glance at the statistics in Table 3 reveals a marked difference in performance between the two groups. The salience of this variation is borne out more clearly in Figure 1. Contrary to Hypothesis 1, the vocabulary gain from the pretest to the first post-test was vastly greater for the comparison group, χ2 = 7.52, df = 1, p ≤.05. In support of Hypothesis 2, however, the experimental group exhibited superior performance on the test for lexical retention, χ2 = 17.18, df = 1, p ≤.05. The comparison group actually experienced a 21% decline in mean performance between post-tests. Although the two groups performed quite differently with respect to each other, there was, interestingly enough, little difference between the groups' net vocabulary gains as measured from pretest to post-test 2: χ2 = .47, df = 1, p ≤.49.

Figure 1
Grand Means

In contrast, data gathered from questions 2-5 of the questionnaire failed to reveal a significant correlation with any of the constructs on that instrument hypothesized to be predictor variables. Rho values ranged from .28, N = 17, p ≤.28, for question 4 to -.03, N = 17, p ≤.92, for question 5.


In corroboration of Ferris et al. (1988), the present study reveals that L2 learners can indeed acquire a significant amount of lexis through a holistic, meaning-focused reading of literature. Moreover, if one assumes that multiple choice tasks elicit receptive knowledge and that gap filling tasks elicit productive responses (Hughes, 1989), Figures 2 and 3 would indicate that the efficacy of reading literature extends to both active and passive vocabulary skills, though admittedly to a lesser extent for the latter.

Figure 2 Group Means

Figure 3 Group Means
(Multiple Choice)

Contrary to the findings of Ferris et al., however, this study shows no apparent relationship between reported frequency of dictionary consultation and the amount of vocabulary acquired, nor do these results corroborate their finding that the degree to which subjects reportedly understand and enjoy the novel influences the amount of lexical acquisition. The dissonance between these findings may be due in part to a lack of sensitivity in the instrument used in this study. Each of the items analyzed on the questionnaire contained an ordinal scale with a four-point spread. The combination of a non-continuous scale, a relatively small point spread, and a small N-size (17) may have desensitized the instrument to some of the more subtle differences between subject responses. Further research is needed to provide a more definitive answer.

Although the results of this experiment fail to demonstrate any superiority of reading over rote memorization (for either short-term or longer-term purposes), they nonetheless reveal that reading literature is at least as effective as rote memorization for the purpose of long-term vocabulary development. The group trends presented in Figure 1 indicate that reading may even afford a more effective way than rote memorization to encourage long-term retention. This possibility seems apparent when one juxtaposes the almost perfectly linear progression of the reading group with the comparison group's decline in performance between post-tests. And although only further research can ascertain the outcome of a third post-test, it is not unreasonable to assume--given the known behavior of the two groups--that the reading condition would surpass the paired-associates group. [-9-]

Several theories may help to explicate the behavior of the subjects in this study. One theory that offers a possible rationale for the comparison group's irregular performance is that of "restructuring." The theory of cognitive restructuring--in contrast to the more traditional, behavioristic view of learning--purports that learners assimilating new material restructure, rather than merely accrue to, previously existing knowledge (or cognitive structures). This theory provides an appealing explanation for the U-shaped learning curve that frequently appears in language studies (e.g., Kellerman, 1985). According to this explanation, learners at the bottom of the curve have not necessarily forgotten the material they once possessed (when at the peak), but have merely reorganized that material into temporarily inaccessible structures. Such reorganization, the theory maintains, sometimes "causes performance . . . [to decline] as more complex internal representations replace less complex ones, and . . . [then increase] again as skill becomes expertise" (McLaughlin, 1990, p. 113). The result is a more comprehensive, unified, and durable mental structure that allows the learner to attain an even higher peak.

The main question here is whether the comparison group's poorer performance on the second post-test is a result of mere lexical attrition or the process of restructuring. If the latter were confirmed, a lower group performance at Time 2 would simply indicate the bottom plateau of the U-curve. In this scenario the comparison group would naturally be expected to peak again after completion of the restructuring process, thus quite possibly outperforming the reading group on a third post-test.

As mentioned, restructuring seems to occur as previously acquired material is reorganized by the interaction of new input. The problem with applying restructuring to these data, however, is that subjects in the paired-associates group received no controlled input subsequent to the first post-test. Since subjects in the word list condition may have encountered small amounts of relevant input outside the context of the experiment, restructuring cannot be ruled out entirely as a possible factor influencing the comparison group's decline in performance. Nevertheless, it seems unlikely that a small amount of extraneous input could underlie such massive restructuring.

Although the theory of restructuring leaves questions unresolved regarding the comparison group's fluctuating performance, restructuring, in combination with the levels-of-analysis model, provides a compelling explanation for the superior retention measured in the reading condition. This becomes clear as one considers the way restructuring works on the lexical level.

In the lexical domain, restructuring--in addition to sometimes altering morpho-syntactic parameters--often involves the reorganization of semantic boundaries, "[t]he instances to which a word's meaning can be applied in linguistic usage" (Ijaz, 1986, p. 405). A useful way to understand semantic boundaries is to view them as bundles of "semantic features." Semantic features (Aitchison, 1987) are subunits of lexical meaning which, when combined, constitute the sum of what we consider to be a word's meaning. For example, the word bachelor consists of at least the features [+ male] and [- married]. These two features, then, contribute in determining the semantic boundaries of the term bachelor, for only in instances where these features are appropriate can the term be applied. [-10-]

Lexical restructuring is often necessary when learning new words since few words, if any, are exactly alike in the semantic features they contain (Palmer, 1981). A common tendency of L2 learners is to transfer meaning from words in their native language to corresponding words in the target language, thus creating an intermediate language or "interlanguage" that reflects lexical characteristics of both the L1 and the L2. More specifically, according to Ijaz (1986), this process occurs as "[c]oncepts underlying words in the L1 are transferred to the L2 and mapped onto new linguistic labels, regardless of differences in the semantic boundaries of corresponding words" (p. 405).

Take, for example, an L2 learner trying to acquire the semantic subtleties of the English word kill. In English, the word kill consists of at least three semantic features: {+ die, + cause, - natural}. The concept [- natural] is not optional but is a necessary feature, as evidenced by the awkwardness of the sentence Bill was killed of natural causes. Usage of a corresponding term in the learner's native language, however, may not demand the element of unnaturalness so that only the features of death and causality are obligatory. Upon first encounter with the English word kill, then, the learner would probably transfer only the set of features {+ die, + cause}.

To acquire the target word in its entirety, the learner must restructure the boundaries of the interlanguage state to encompass the feature [- natural]. Because of the subtlety and elusiveness of such features, however, restructuring of this kind cannot occur readily when the input consists of simple paraphrases or dictionary definitions such as one would find in a vocabulary list of paired associates.

When learning words collocationally in a holistic context, on the other hand, there is a much greater possibility that word meanings will be acquired more fully and that when restructuring is needed, it will occur with greater ease and stability. A single encounter with a new word is unlikely to effect much lexical acquisition (Nagy & Herman, 1987). However, when a word is encountered repeatedly, especially in varied environments, learners are more likely to gain exposure to a wider range of features, enabling them to adjust semantic boundaries more easily.

This is quite possibly what happened with the experimental group. Several of the words on the test appear repeatedly in Animal Farm (see Appendix D for frequency counts). Although a correlational analysis of word frequency and test performance is beyond the scope of this investigation, the subjects' repeated contact with these words in an authentic context should have, theoretically, provided conditions conducive to restructuring.

As a global theory, restructuring is invoked to explain a variety of cognitive phenomena (McLaughlin, 1990). As applied specifically to lexical acquisition, however, the theory predicts that, given sufficient input, learners can reorganize incomplete semantic boundaries so as to acquire target words more fully. This process, interpreted together with the "levels of analysis" view, offers an even more compelling explanation for the experimental group's performance. Relying heavily on trace theory, the levels-of-analysis paradigm (Craik, 1973; Craik & Tulving, 1975) suggests that learners perceive or "analyze" stimuli at various semantic levels and that the durability of a given memory trace is a positive function of the depth to which the encoding stimulus is analyzed. What is most important to note here is that, according to this model, a "deep" analysis is achieved not by any quantitative encoding operation, such as mental rehearsal and repetition, but by a qualitative one, which involves focusing on meaning.

In support of their hypothesis, Craik and Tulving (1975) demonstrated in a series of ten meticulous experiments that subjects who focus on meaning (even without intention to remember) process stimuli more deeply and retain such stimuli longer than do subjects who pay a great deal of attention to structure and who purposely attempt to remember. These experiments involved displaying L1 vocabulary through a tachistoscope over varying intervals, usually within the millisecond range. Before viewing each word, subjects were asked a question concerning one of three particular characteristics of the word: its typeset/graphics (whether it was in capital letters, for instance), its phonemics (e.g., whether the word rhymed with "train"), or its meaning (e.g., if the word was an animal name). The researchers found repeatedly that subjects who were asked questions about meaning performed better on retention tests--expected or unexpected--than subjects who were asked questions about either structure or phonemics. [-11-]

The practical implications of this discovery seem straightforward: the most effective way to learn a new word is not by repeating the word by rote, but by focusing on the word's meaning and all that is entailed therein. To illustrate with a concrete example, we might envision an English speaker trying to acquire the meaning of a certain Spanish word, say, gato. Rather than encourage the learner to mechanically reiterate the paired associates gato-cat over and over, the levels of analysis view would predict greater success if the learner contemplated all the semantic features denoted by the word--e.g., that a gato is a carnivorous domestic feline that purrs, has excellent night vision, and so on.

It seems plausible, then, to assume that the levels of analysis model can, under the right conditions, complement the process of lexical restructuring. Since lexical restructuring predicts that learners who encounter words in holistic and varied contexts are in a better position to perceive more comprehensively the full range of semantic features denoted in the target words, it follows that such learners are more likely to establish deeper memory traces and thus experience enhanced lexical retention.

Examined through this combined framework, the difference in behavior between the two groups in this study can be seen as the result of differing degrees of boundary restructuring and semantic analysis, the reading group exhibiting greater retention because of the greater meaningfulness of the input. Although the notion of "meaning" is much debated by philosophers and linguists, there can be little doubt that reading a novel is a more "meaningful" experience (in the conventional sense of the term) than memorizing a decontextualized word list. Unlike glosses or dictionary definitions, authentic context allows one to experience words in their "natural" environment with all the cognitive and affective associations that make discourse meaningful.


The results of this study indicate that reading literature can indeed be an effective means of helping adult L2 learners acquire new vocabulary. Moreover, this study suggests that, although rote memorization is more effective for developing a short-term word stock (as might be needed for an upcoming test), reading literature is at least as effective as--and perhaps is more effective than--rote memorization for the purpose of promoting longer-term lexical retention.

The average vocabulary gain measured in the experimental group was 7.57 words. (Note that this figure reflects only the "measured" gain. The actual gain may have been higher since subjects may have acquired other words in the novel that were not tested.) This number is smaller than the 11.5 average word gain measured in the Ferris et al. (1988) study but is nonetheless a significant 22% of the total number of words tested. This gain is particularly notable in that it reflects subjects' ability to use productively the words they had acquired, a skill estimated to be 50% to 100% harder than learning words receptively (Nation, 1990). The edition of Animal Farm used in this study contains 113 pages and approximately 46,000 words of text, which means that subjects on average acquired at least one new word for every 6,077 words encountered. Accordingly, if subjects encounter 1.7 million words of text annually--a conservative estimate for many adults (Kirsch & Guthrie, 1984)--they will acquire approximately 280 new words per year. After four years of college, this amounts to over 1,100 words acquired holistically through reading alone. This is not to mention words acquired incidentally through listening to lectures and engaging in personal communication. Over the span of a lifetime, according to these estimates, an avid reader can acquire a phenomenal vocabulary (cf. Krashen, 1993; Nagy & Herman, 1987). [-12-]

Of course, lexical acquisition is only one of the pragmatic benefits of reading literature. As comprehensible input (or "intake"), the written word can also help improve grammatical skills and writing ability (Chomsky, 1972; Tsang, 1996), as well as spelling (Krashen, 1989). Perhaps the greatest benefit to be obtained from reading literature, however, is the unlimited knowledge and pleasure it can bestow.

Because of the enormous influence that reading can have on linguistic and cognitive development, a comprehensive reading agenda for language students of all grade levels would seem to be in order. Under this agenda students in ESL or foreign-language classes could participate in "free voluntary reading" (Krashen, 1993) on a weekly or perhaps even daily basis. The reading material and time allotted could be determined by instructors or departments. Material could be selected for groups of students based on their reading level, interests, and academic needs. In many cases students might have to begin with simplified readings and progress to more advanced materials such as essays, short stories, and novels. Reading could be done during class time (see Henry, 1995 for practical suggestions) or during special reading periods.

Unfortunately, such agendas seem to be rare in foreign- and second-language classrooms in higher education. Many language teachers at this level feel apprehensive about incorporating reading time into the course syllabus. As a result, students often read only the bare minimum to "get by" in a course. The author has noted in his own personal experience as a student and as a teacher that very little reading occurs in college language classrooms (whether in community colleges, universities, or intensive English programs) until the sophomore year, at which time some students are introduced to literature. Until then, the emphasis is almost exclusively on grammar, writing, and, in foreign-language and ESL classrooms, conversational ability. These are, no doubt, worthy pursuits and should remain integral parts of the language classroom. However, given the efficacy of reading in cultivating vocabulary and other language skills, it does not make good pedagogical sense to subordinate reading to other language modalities.

In short, the direct teaching of vocabulary, whether by paired associates or another method, has a legitimate role in lexical development and should not be dismissed. It may be of use in helping one learn certain marked or low frequency words that are not acquired easily incidentally. Nevertheless, given the extreme utility and versatility of reading, it appears that the most beneficial way for teachers to help their students develop a stable, comprehensive vocabulary is not to give them stale, decontextualized word lists, but to introduce them to the pleasures of reading literature.


[1] Although the term "acquisition" is sometimes used as a blanket term to encompass the notion of retention, for purposes of analysis and operationalization, the two constructs acquisition and retention are considered separate in this study and are measured independently. [-13-]

[2] Despite a rigorous proofreading, two mechanical errors surfaced on the final version of the test. One error (item 25) was merely a duplication of the letter D such that the fifth choice for this item was labeled "D" rather than "E." The other error (item 34) consisted of a complete omission of choice E. Since neither D nor E was the correct answer for either item in question, the influence of the errors on subjects' responses was probably negligible.

[3] t tests were initially performed to compare the means of all dependent and independent measures, and a Bonferroni inequality was used to adjust for error in performing multiple comparisons. Because of high variance and skewed distributions, however, the decision was made to use the median test as the primary tool for determining statistical significance. Not unexpectedly, the results of both procedures were virtually identical.


Aitchison, J. (1987). Words in the mind. New York: Basil Blackwell.

Anderson, J. P., & Jordan, A. M. (1928). Learning and retention of Latin words and phrases. Journal of Educational Psychology, 19, 485-496.

Chomsky, C. (1972). Stages in language development and reading exposure. Harvard Educational Review, 42, 1-33.

Craik, F. I. M. (1973). A "levels of analysis" view of memory. In P. Pliner, L. Krames, & T. Alloway (Eds.), Communication and affect: Language and thought (pp. 45-65). New York: Academic Press.

Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology, 104, 268-294.

Elley, W. B. (1989). Vocabulary acquisition from listening to stories. Reading Research Quarterly, 24, 174-187.

Ferris, D., Kiyochi, E., & Kowal, K. (1988). Second language vocabulary acquisition from extensive reading. Paper presented at the TESOL National Convention, Chicago, Ill.

Goulden, R., Nation, P., & Read, J. (1990). How large can a receptive vocabulary be? Applied Linguistics, 11, 341-363.

Hague, S. A. (1987). Vocabulary instruction: What L2 can learn from L1. Foreign Language Annals, 20, 217-225.

Hall, J. (1988). On the utility of the keyword mnemonic for vocabulary learning. Journal of Educational Psychology, 80, 554-562.

Henry, J. (1995). If not now: Developmental readers in the college classroom. Portsmouth, NH: Boynton/Cook.

Horst, M., Cobb, T., & Meara, P. (1998). Beyond A Clockwork Orange: Acquiring second language vocabulary through reading. Reading in a Foreign Language, 11, 207-223.

Hughes, A. (1989). Testing for language teachers. New York: Cambridge.

Ijaz, I. H. (1986). Linguistic and cognitive determinants of lexical acquisition in a second language. Language Learning, 36, 401-451.

Jenkins, J., & Dixon, R. (1983). Vocabulary learning. Contemporary Educational Psychology, 8, 237-60.

Joe, A. (1998). What effects do text-based tasks promoting generation have on incidental vocabulary acquisition? Applied Linguistics, 19, 357-377.

Kellerman, E. (1985). If at first you don't succeed . . . . In S. Gass & C. Madden (Eds.), Input in second language acquisition (pp. 345-353). Rowley, MA: Newbury House.

Kirsch, I. S., & Guthrie, J. T. (1984). Prose comprehension and text search as a function of reading volume. Reading Research Quarterly, 19, 331-342. [-14-]

Krashen, S. (1989). We acquire vocabulary and spelling by reading: Additional evidence for the Input Hypothesis. The Modern Language Journal, 73, 440-63.

Krashen, S. D. (1993). The power of reading: Insights from the research. Englewood, CO: Libraries Unlimited.

McLaughlin, B. (1990). Restructuring. Applied Linguistics, 11, 113-128.

Nagy, W. E. & Herman, P. A. (1987). Breadth and depth of vocabulary knowledge: Implications for acquisition and instruction. In M. G. McKeown & M. E. Curtis (Eds.), The nature of vocabulary acquisition (pp. 19-35). Hillsdale, NJ: Lawrence Erlbaum.

Nation, I. S. P. (1990). Teaching and learning vocabulary. Boston: Heinle & Heinle.

Palmer, F. R. (1981). Semantics (2nd ed.). Cambridge University Press.

Paribakht, T. S. & Wesche, M. (1999). Reading and "incidental" L2 vocabulary acquisition: An introspective study of lexical inferencing. Studies in Second Language Acquisition, 21, 195-224.

Saragi, T., Nation, I. S. P., & Meister, G. F. (1978). Vocabulary learning and reading. System, 6, 72-78.

Tinkham, T. (1989). Rote learning, attitudes, and abilities: A comparison of Japanese and American Students. TESOL Quarterly, 23, 695-698.

Tsang, W.-K. (1996). Comparing the effects of reading and writing on writing performance. Applied Linguistics, 17, 210-233.

Wode, H. (1999). Incidental vocabulary acquisition in the foreign language classroom. Studies in Second Language Acquisition, 21, 243-58.

About the Author

Frank Hermann holds an M.A. in applied English linguistics from the University of Houston and is currently a Ph.D. candidate in the rhetoric and linguistics program at Indiana University of Pennsylvania. He is writing his dissertation on the use of expert systems in writing assessment and pedagogy.


Appendix A
Appendix B
Appendix C
Appendix D
Appendix E

© Copyright rests with authors. Please cite TESL-EJ appropriately.

Editor's Note: Dashed numbers in square brackets indicate the end of each page for purposes of citation.

Return to Table of Contents Return to Top Return to Main Page