August 2024 – Volume 28, Number 2
https://doi.org/10.55593/ej.28110a10
Nam Nhat Lien
University of Economics Ho Chi Minh City, Vietnam
<namlien.1202
gmail.com>
Nhi Hoa Mai
University of Economics Ho Chi Minh City, Vietnam
<maihoanhi2000
gmail.com>
Nguyen Huynh Trang
University of Economics Ho Chi Minh City, Vietnam
<trangnh
ueh.edu.vn>
Abstract
In EFL countries where English is rarely practiced outside the classroom, textbooks have become the major input source for learners. Particularly in Vietnam, multiple textbook series are available simultaneously for the same grade. Thus, it is important to examine if their vocabulary is appropriate and of similar difficulty. This study aims to investigate and compare the lexical demands, sophistication, diversity, and lengths of reading passages in the eight latest series for Vietnamese 10th graders with 53,360 tokens in total. The results revealed that the most frequent 1,000, 2,000-3,000, and roughly 4,000-word families in the BNC/COCA wordlist, plus proper nouns, marginal words, transparent compounds, and acronyms, were respectively needed for 85%, 95%, and 98% coverage. Additionally, pairwise comparisons uncovered that the passages differ significantly in length yet insignificantly in lexical sophistication and diversity. Therefore, the series appear to be well-suited to co-implementation and facilitative to vocabulary development despite not being optimized for independent learning. The study still calls for simplifying the eight textbook series to promote meaning-focused output. Finally, implications for exploiting and revising these textbook series are discussed.
Keywords: Lexical coverage, Reading comprehension, Lexical sophistication, Lexical diversity, EFL textbooks
Given the growing demand for globalization, English has become the compulsory foreign language for 3rd-12th graders in Vietnam, pursuant to the National Foreign Languages Project 2020 (NFLP 2020; Vu & Peters, 2021). In 2018, the Vietnam Ministry of Education and Training (MoET) designed the General Education English Curriculum (GEEC) to guide the compilation of English textbooks serving the Project (Ministry of Education and Training, 2018). With an ambition to diversify options for textbooks and teaching approaches, GEEC implements the “one curriculum, multiple textbooks policy” (Hoang, 2022, p. 1), which allows domestic publishers to collaborate with international ones to introduce numerous textbook series for students in the same grade.
Four years after the implementation of GEEC, however, only grades 3, 6, 7, and 10 have been applied to the new policy. Though they are accredited by MoET, teachers, parents, and students appear to be confused and doubtful about whether the actual outcomes of English courses using various textbooks are equivalent. Since grade 10 is the foundation year for upper-secondary education, any difference in its outcome may result in knowledge gaps among students at the whole level, thereby disqualifying the standard of the NFLP 2020. Moreover, at the end of the level, students’ English proficiency will be uniformly assessed. Therefore, the evaluation of these textbooks draws special attention to stakeholders. Because this study focused on the upper-secondary level in which grade 10 has been the only grade to exercise the policy, the textbook series for this grade were chosen for the evaluation.
In order for several textbook series to be applicable to the same student population, their content must be of similar complexity to each other and to MoET’s general standards. Given that textbooks expose students to input content by means of reading comprehension which is positively correlated with vocabulary knowledge (Rahmat & Coxhead, 2021), we can say that the vocabulary of the aforementioned textbooks plays a decisive role in evaluating their equivalence. Specifically, these textbooks can be evaluated in four aspects. First, their lexical demands must be matched so as to equalize students’ comprehension levels. Second, the lengths of their reading passages should not differ significantly since textbooks with longer readings will be harder to absorb (Mesmer & Hiebert, 2015). Third, they must offer a range of vocabulary comparable both in sophistication and diversity based on the assumption that these are the two predominant predictors of word difficulty (Hashimoto & Egbert, 2019; Vitta, Nicklin, & Albright, 2023). If there is any textbook whose vocabulary is of higher sophistication and diversity, it will be more difficult than the others. Finally, the lexical profiles of all textbooks must align with the proficiency level prescribed by MoET to ensure that their vocabulary input is standardized. These requirements are challenging for publishers since it can be difficult to meet them all.
Notwithstanding its crucial role, lexical analysis of textbooks seems to be scarce in Vietnam. The literature showed that most textbook evaluations in Vietnam paid attention to sociolinguistic and cultural aspects (Dang & Seals, 2016), communicative language teaching (Nguyen, 2015), pragmatics and intercultural pragmatic competence (Nguyen, 2007, 2011; Nu, 2018; Ton Nu & Murray, 2020), and emphasized grammar or tasks over vocabulary (Ngo & Luu, 2018). To date, no studies have investigated and compared the vocabulary of the latest textbook series in Vietnam. In recognition of this gap, the present study sets out to discover the amount of vocabulary needed for grade 10 students to understand the reading passages in each of these textbook series. Additionally, we also measure the passage lengths, along with their lexical sophistication and diversity, also known as the two constructs of lexical richness (Read, 2000), to ground for in-depth comparisons between them. Specifically, it seeks to answer the following research questions:
- How much vocabulary is needed to comprehend 85%, 95%, and 98% of the reading in the eight English textbook series for Vietnamese 10 graders?
- To what extent do the readings of the eight textbook series differ concerning length, lexical sophistication, and diversity?
Literature Review
Textbooks in the ELT Classroom
In English Language Teaching (ELT), textbooks are at the core of any course (Sheldon, 1988). According to Richards (2005), they are material for language practice either in skills or content, and for interactions between students and the language. Furthermore, their accompanying components like workbooks and teacher’s books provide teachers with an outline of the course and reduce lesson preparation time to give room for more classroom activities (Nguyen, 2015). Particularly in English as a Foreign Language (EFL) contexts, where English is not practiced in society, textbooks become the primary source of vocabulary and language input in the form of reading (Ahmed, 2021; Häcker, 2008; Rahmat & Coxhead, 2021; Sun & Dang, 2020). This highlights the significance of matching the vocabulary level in the reading of textbooks with learners’ lexical knowledge so that they can understand and make the most of it. It is pivotal to select textbooks that contain proper vocabulary so that learners are exposed sufficiently to the target language.
Vocabulary Requirements for Upper-Secondary English Education in Vietnam
In 2022, Hoang, the chief developer of GEEC, published a paper to elaborate on the standards on which Vietnamese English textbooks must be based. At the beginning of the upper-secondary level, students’ proficiency is likely equivalent to CEFR Level A2 (Council of Europe, 2001) with a vocabulary knowledge of 1400-1700 lexical items since this is the requirement for completion of the preceding lower-secondary level. Upon completion of the level, the students are required to qualify for CEFR B1 (Council of Europe, 2001) with a vocabulary including around 2,000-2,500 common lexical items. This outcome enables the students to comprehend transparent main points of familiar topics and communicate independently in English. Moreover, though GEEC has not explicitly specified a wordlist to be taught to students, as has been based on CEFR, the vocabulary size for CEFR can be adopted as the common benchmark. Accordingly, in an attempt to relate the CEFR levels to vocabulary sizes and word family sizes, Nation (n.d.) suggested that learners at the B1 level would possess a vocabulary size of 2000-3000 most frequent words and the word family size at Level 3 in the British National Corpus/Corpus of Contemporary American English (BNC/COCA; Nation, 2017). As such, this is the target vocabulary input for upper-secondary English textbooks in Vietnam.
Lexical Coverage, Lexical Demand, and Reading Comprehension
Central to the entire discipline of vocabulary studies is the concept of lexical coverage which refers to the percentage of comprehensible words in a text to readers (Nation, 2022). A large body of research has consistently confirmed that higher coverage facilitates greater comprehension (Hu & Nation, 2000; Laufer, 1989; Schmitt, Jiang, & Grabe, 2011). Studies on lexical coverage often seek to determine the lexical demands of texts, meaning the amount of vocabulary needed to obtain the two coverage thresholds of 95% and 98% (Trang, Nguyen, & Ha, 2023).
These two thresholds were set thanks to endeavors to relate lexical coverage and reading comprehension. The earliest study to discover this relationship was conducted by Laufer (1989). The author employed a reading comprehension test that required participants to obtain at least a score of 55 and a lexical coverage test asking them to translate words extracted from the reading texts and to underline unknown ones. A major finding was that a significant number of participants who attained 55 or higher could score 95% on the coverage test, leading to the conclusion that 95% was the threshold of reasonable comprehension.
Unsatisfied with this conclusion, Hu and Nation (2000) extended to examine four thresholds of 80%, 90%, 95%, and 100% with two reading comprehension tests. To prepare for texts with the desired coverages, the vocabulary in a fictitious narrative was simplified and a percentage of the words were replaced by nonsense items. For example, to create the text with 80% coverage, 20% of its running words were replaced by meaningless words while the rest were replaced if required, by a synonym from among the most frequent 2000 words of English. The research subjects were randomly assigned to each version of the text and their comprehension gauged by a multiple-choice test requiring them to get at least 12 out of 14 correct answers, followed by a written recall test of main ideas and supporting detail. The results of the two tests revealed that at 80% and 90%, very few participants could reach a score of 12, with 95% providing adequate comprehension, but that 98% was necessary for unassisted comprehension. As a result, scholars generally accept these numbers: 95% is the minimal threshold for adequate comprehension, while 98% is required for unsupported reading with optimal comprehension (Nation, 2022; Webb & Nation, 2013). Hu and Nation further explained that at 95% coverage, there was 1 unknown word in every 20 words, and at 98%, there was 1 unknown word in every 50 known words. It is apparent that in order to switch from 95% to 98% coverage, the quantity of known words has to double or even triple, representing a seemingly small yet largely meaningful gap between the two thresholds.
Coverage for reading materials should vary according to learners’ purposes (McLean, 2021). In EFL learning, a well-balanced course conforms to the principles of the four strands, including language-focused learning, meaning-focused input and output, and fluency development activities (Nation, 2022). For language-focused learning, the reading should be 85% comprehensible to learners (McLean, 2021; Stoeckel, McLean, & Nation, 2020). This proportion rises to 95% for reading comprehension (Laufer, 1989; Schmitt et al., 2011) and to 98% for meaning-focused learning (Webb & Nation, 2017). For reading fluency, texts should be 100% understandable (Nation, 2007). To optimize learning, reading in textbooks should satisfy half of the four strands with the provision of language-focused and meaning-focused input (Nation, 2022). Though their accuracy is still controversial, McLean (2021) and Ha (2022b) have advocated for 85%, 95%, and 98% as mastery thresholds to match learners with reading materials correspondingly for language-focused instruction, reading comprehension, and meaning-focused instruction.
Word-Frequency Lists
Lexical demands of texts are determined by reference to word-frequency lists. Frequency lists enable learners, instructors, and materials developers to identify which words are worth noticing and mastering (Hashimoto & Egbert, 2019; Youngblood & Folse, 2017) given that words occurring more frequently are more useful (Laufer & Ravenhorst-Kalovski, 2010; Nation, 2022). These lists operationalize relying on word-counting units subjective to their creators (Trang et al., 2023). In the field of vocabulary studies, there has been a long-standing debate about what should be counted as a word and best reflects vocabulary size. A word-counting unit can be either token, type, lemma, flemma, or word family (Nation, 2022). Token counts every single word regardless of its repetition in a text and is also referred to as running words. Types can be understood as unique tokens since they count instances of the same token just once. Lemma and flemma differ from each other in that the former includes a headword and its inflectional forms in the same part of speech while the latter ignores the variation in part of speech. Finally, a word family regards a headword and all of its inflectional and derivational forms as a unit.
Among a variety of word-frequency lists, this study utilized Nation’s (2012, 2017) British National Corpus/Corpus of Contemporary American English (BNC/COCA) based on word family, which has been the most comprehensive and employed by lexical profiling research (Dang & Webb, 2016; Dang, Webb, & Coxhead, 2020; Ha, Le, & Phung, 2022; Schmitt, Cobb, Horst, & Schmitt, 2017). The list was updated from the British National Corpus (BNC; Nation, 2006) compiled from a 100-million-token corpus of diverse real-life genres (Le & Ha, 2023). BNC is made up of 14 1,000-word-family levels arranged by range, dispersion, and frequency, and is accompanied by two supplementary levels of proper nouns and marginal words. Owing to a strong bias towards British English and dated words from the 1980s to early 1990s, it was revised and complemented by American English words to make up the BNC/COCA with 25 1,000-word levels plus 4 supplementary levels of proper nouns (PNs), marginal words (MWs), transparent compounds (TCs), and acronyms. BNC/COCA ranks words according to high-, mid-, and low-frequency of occurrence. High-frequency words extend to Level 3 (Schmitt & Schmitt, 2014), comprising function words and many content words, and can cover most of the running words in general texts. Mid-frequency words spread across Level 4 to Level 9 while the rest, those beyond Level 9, are classified into low-frequency words.
Previous research has offered insights into the number of word families needed to learn in the BNC or BNC/COCA wordlists to understand different types of texts. Webb and Rodgers (2009a, 2009b) suggested that 3,000- and 6,000-7,000-word families plus PNs and MWs in the BNC list would be needed for 95% and 98% coverage of television programs. Regarding podcasts, the profile established by Nurmukhamedov and Sharakhimov (2021) based on the BNC/COCA list unveiled that 3,000 and 5,000-word families plus the supplementary lists of PNs, MWs, TCs, and acronyms covered 96.75% and 98.26% of the corpus. In a more recent attempt to revisit the lexical demands of movies, TV programs, and soap operas, Ha (2022a) concluded that audiences should be familiar with 3,000 and 5,000-word families in the BNC/COCA list to obtain 95% and 98% coverage. While examining English-medium Instruction (EMI) textbooks in multiple disciplines, Hsu (2011) found that 3,500 and 5,000-word families plus PNs respectively granted 95% and 98% coverage of business textbooks while 5,000 and 8,000 levels were demanded for the two similar thresholds of business research articles. Later on, in 2018, Hsu discovered that the 10,000 level of the BNC/COCA list would cover 98% of Medicine textbooks. Lately, her study on Civil Engineering textbooks in 2022 figured out that the coverage of 95% and 98% would be reached as long as learners possessed the knowledge of 5,000 and 10,000-word families in the BNC/COCA list, respectively.
Lexical Richness: Lexical Sophistication and Lexical Diversity
As construed by Read (2000), lexical sophistication is the proportion of rare and advanced words in a text, which include technical terms, jargon, and uncommon words rather than general and everyday vocabulary. In assessment, it is identified as a predictor of word difficulty, which is a method to appraise vocabulary knowledge (Hashimoto & Egbert, 2019). Among a variety of variables measuring lexical sophistication, frequency receives special attention from both psycholinguists and corpus linguists. In psycholinguistics, words that occur more frequently in discourse tend to be recognized and memorized more quickly (Ellis, 2002; Milton, 2009). From the corpus linguistics perspective, words in higher frequency levels are more familiar to learners (Milton, 2009). Nation (2022) holds that it is the percentage of words in lower-frequency levels in a text that counts as the measure of lexical sophistication. Both viewpoints initially agreed that words “beyond 2,000” in frequency lists were sophisticated words (Alsaif & Milton, 2012; Arnaud, 1984; Laufer, 1995; Linnarud, 1986). It was not until the work of Schmitt and Schmitt (2014) that this boundary was stretched to 3,000-word families.
Lexical diversity is used to “measure the variety of lexical items in a text” (Nation, 2022, p. 228) and is also referred to as lexical variation (Daller, van Hout, & Treffers-Daller, 2003). This characteristic is an indication of lexical knowledge since the wider range of vocabulary used in a text signifies a higher proficiency level (Zenker & Kyle, 2021). In the measurement of lexical richness, Daller et al. (2003) argued that it is essential to evaluate both the depth and breadth of vocabulary knowledge. Hence, lexical diversity is associated with sophistication as an important construct. They underlined that a text can reach a higher lexical sophistication score as its vocabulary gets more diverse.
Text Length and Lexical Diversity Indices
Mesmer and Hiebert (2015) have proven that in reading comprehension, a text can be more challenging as its length increases, especially for low-proficiency students. Assuming that two texts are lexically equivalent, the longer one will be more distracting, memory-demanding, time-pressing, and demotivating for readers (Andreassen & Bråten, 2010; Forrin et al., 2021). In the classroom, lengthy texts could cause students to perform more poorly in comprehension tests, compared to shorter texts of similar complexity (Mesmer & Hiebert, 2015). Due to the fact that texts differ in length, there have long been concerns about text length effects on lexical diversity indices.
For decades, the type-token ratio (TTR or simple TTR; Johnson, 1944) has been universally applied as the simplest index, calculated by dividing the number of lexical items in a text (types) by the total number of running words (tokens):
(Zenker & Kyle, 2021). Nonetheless, later studies unveiled its instability and serious sensitivity to text length in that longer texts would score lower (Arnaud, 1984; McCarthy & Jarvis, 2007; Richards, 1987; van Hout & Vermeer, 1988). Consequently, TTR’s variances were put forward with some mathematical transformations to lessen such effects. Particularly, Root TTR (G-index; Guiraud, 1960) and Logarithmic TTR (Log TTR; Chotlos, 1944; Herdan, 1960) are the two best-known indices to be transformed simply from TTR. To stabilize the TTR for longer texts, the Root TTR takes the square root of the number of tokens (
) while the Log TTR divides the logarithm of the number of types by the logarithm of the number of tokens (
). Unfortunately, the two studies of Hess, Sefton, and Landry (1986) and Hess, Haug, and Landry (1989), which cut texts into smaller segments with different lengths for parallel sampling, showed that these lexical diversity indices remained susceptible to text length.
Thanks to computational advancements, complicated transformations have aided in creating more accurate and stable indices that could control text length. McCarthy and Jarvis (2007) improved the calculation method of Malvern and Richards’ (1997) vocd-D to invent HD-D. Vocd-D calculates the average value of three best-fitting coefficients, symbolized as D, between a theoretical and empirical curve created from the mean TTRs of 100 random samples of 35-50 tokens within a text. Meanwhile, HD-D utilizes hypergeometric distribution to compute the total probabilities of the occurrences of all types in a random 42-token sample within a text. Despite being strongly criticized, TTR has been incorporated into modern indices. One of them is Covington and McFall’s (2010) Moving-average TTR (MATTR) which calculates the mean TTR of 50-token samples moving from the beginning to the end of a text (i.e., tokens 1-50, 2-51, 3-52, etc.). Another index developed from TTR is the Measure of Textual Lexical Diversity (MTLD Original) by McCarthy (2005). This measure is processed by calculating the TTR of every token from the beginning of a text to the token whose value equates to .720 or lower. Then, it moves to the next token and proceeds likewise until the end. Such strings of tokens are referred to as “factors” and must be longer than 10 tokens. The final string, which often includes less than 10 tokens, is counted as a “partial factor”. The MTLD value is obtained by dividing the total token by the sum of all factors and the partial factor. Noticeably, Zenker and Kyle’s (2021) analysis of samples with different lengths evidenced that MATTR, HD-D, and MTLD Original outperformed the traditional indices concerning stability and resistance to text length.
Relevant Research on the Vocabulary of English Textbooks in EFL Countries
In the Indonesian context, Aziez and Aziez (2018) analyzed six junior- and senior-high-school English textbooks by private publishers and found that 4,000-5,000-word families in the BNC list were needed for 95% comprehension, and their mean TTR score was equally 0.23. As to their sophisticated words beyond 2,000, they accounted for 7.89% and 11.87% of the junior and senior textbooks. On the other hand, Rahmat and Coxhead (2021) questioned the lexical coverage of textbooks for senior high schools published by the Indonesian Government. Their analysis revealed that 95% and 98% coverage of the textbooks could be gained with 3,000-4,000- and 5,000-6,000-word families in the BNC/COCA list only if the supplementary lists were added. It was also indicated that high-frequency words in the first two lists covered more than 80% of these textbooks. Collectively, the two studies concluded that the vocabulary in English textbooks by both Indonesian private and government publishers would pose challenges to students’ reading comprehension.
In China, Sun and Dang (2020) compared the lexical coverage of the 273,094-token Yilin textbooks with the vocabulary knowledge of 265 upper-secondary students. To this end, the participants were administered Webb, Sasao, & Ballance’s (2017) Updated Vocabulary Levels Test (UVLT). The profiles suggested that the textbooks demanded 3,000 and 9,000-word families in the BNC/COCA list plus the supplementary lists for 95% and 98% coverage. Compared to the results of the UVLT which reported that the students’ vocabulary knowledge revolved around the 1,000 level, Yilin was likely too demanding and impeded their vocabulary uptake. Most recently, Yang and Coxhead (2022) carried out research on Book 3 and Book 4 in the New Concept English (NCE) textbook series with 40,895 tokens in total. As observed from the results, Book 3 required the students to have 3,000 and 5,000-word families in the BNC/COCA list for 95% and 98% coverage supposing that they had extra knowledge of the supplementary lists. By contrast, 5,000 and 25,000-word families would be required for the comprehension of Book 4, each threshold would need 1,000 more word families. Additionally, this study demonstrated that more than 85% of the textbooks contained high-frequency words in the first two lists alongside less than 10% of mid-frequency words, which would be beneficial to the vocabulary development of the students.
In the EFL context of Vietnam, one of the first lexical analyses of MoET’s upper-secondary English textbooks was reported by Nguyen (2020). The study tested the prior vocabulary knowledge of 422 upper-secondary students with Webb et al.’s (2017) Vocabulary Levels Test (VLT) and profiled reading passages in the textbooks based on the BNC/COCA list. The results disclosed that with vocabulary knowledge at the 2,000 level, the students could cover 87.1% of the textbooks. To achieve 95% and 98%, they had to add in 1,000 and 3,000-word families for each corresponding threshold. In a similar vein, Le and Dinh (2022) analyzed the 41,137-token Grade 10 textbook by MoET and found that 10th graders needed the most frequent 3,000 and 5,000-word families in the BNC/COCA list for 95% and 98% coverage. Drawing on the VLT’s results of Nguyen (2020), they substantiated the conclusion that this textbook offered too many new words and hindered students’ reading comprehension. As for the coverage for high-frequency words, they found that the textbook covered just more than half of the second level. It became clear that the Grade 10 textbook could not guarantee learners’ vocabulary development and incidental learning and needed considerable adaptation from the teachers in the classroom.
The two studies of Nakayama (2021, 2022) examined high-frequency words in reading passages of a set of Japanese government-approved EFL textbooks to illuminate if they were useful for real-life English and sufficient for comprehension of authentic texts. The author profiled the textbooks with the New General Service List (NGSL; Browne, Culligan, & Phillip, 2013) composed of 2,801 high-frequency general English words, divided into five 560-word levels, plus 52 supplemental words including days of the week, months of the year, and numbers. This wordlist counts words by modified lexeme, which corresponds to flemma. The studies analyzed both the coverage of textbooks by the list and that of the list by the textbooks. The results revealed that though 92%-96% of the textbooks’ word tokens were high-frequency words, these words accounted for only 38% of the NGSL. Further exploring the distribution of high-frequency words in these textbook series, Nakayama (2022) found that the textbooks had not fully covered the first level of NGSL and half of the second to fifth levels. Taken together, the two studies concluded that these Japanese English textbooks were insufficient for comprehension of authentic texts, calling teachers to compensate through homework and classroom activities.
Research Gaps and the Present Study
Although the studies above have contributed to the literature on the vocabulary of EFL textbooks, they spotted some gaps that should be addressed in further research. First, the Vietnamese context lacks comparative analyses of different upper-secondary English textbook series. Although the studies in the other EFL countries were independent works, they could jointly serve as references for stakeholders. By contrast, the two studies in Vietnam focused solely on MoET’s textbook, which was the series preceding one of those analyzed in this study. No studies, to the best of the researcher’s knowledge, have been conducted on all newly published series. Furthermore, Yang and Coxhead (2022) pointed out that the series in China they researched had different lexical demands from those reported by Sun and Dang (2020). This raises a question about whether the textbook series currently available in Vietnam encounter the same issue.
Second, there are methodological issues in the previous studies. Except for Aziez and Aziez (2018), none of the remaining studies evaluated textbooks using both lexical sophistication and lexical diversity. Most of them adopted the percentage of words beyond 2,000 in the BNC/COCA wordlist as a gauge of lexical sophistication, and yet none regarded lexical diversity. In an influential paper, Schmitt and Schmitt (2014) advocated for expanding the number of high-frequency words to the 3rd level. Additionally, even though Aziez and Aziez (2018) took into account lexical diversity, they employed the TTR, which has long been criticized for its susceptibility to text length. Findings on the vocabulary of textbooks would be more precise if the most up-to-date level and cutting-edge metrics were utilized.
The literature review signifies the necessity of performing a corpus-based comparative analysis of the latest grade-10 English textbook series in Vietnam. To address this need, the present study attempts to (1) establish lexical profiles of the reading in these textbooks, (2) determine their lexical sophistication using the updated level suggested by Schmitt and Schmitt (2014), and (3) adopt more stable and modern indices to measure their lexical diversity. The objective of the comparison is to examine the extent to which these textbook series are similar as to lexical demands and lexical richness.
Methodology
Research Design
This study comprised two phases. Phase 1 profiled the textbooks with AntWordProfiler version 2.1.0 (Anthony, 2023) to discover their coverage in the BNC/COCA wordlist (Nation, 2017). Particularly, this study mirrored Ha (2022b), which was to consider 85% as a coverage threshold for language-focused instruction besides the two conventional thresholds of 95% and 98%. Then, Phase 2 calculated their lexical richness with the support of TAALED 1.4.1 (Kyle, Crossley, & Jarvis, 2021) and made pairwise comparisons employing Jamovi 2.3.28 (The Jamovi Project, 2023) to examine whether there was any significant difference among the textbooks.
Data Collection
The corpus consisted of 53,360 tokens collected from reading passages of the eight English textbook series for Vietnamese 10th graders by various publishers. Since the academic year 2022-2023, there have been nine series approved by MoET. However, because Macmillan Move On by Ho Chi Minh City University of Education Publishing House was inaccessible and also was found to have been selected by very few provinces, so it was excluded from this study. Table 1 provides an overview of the eight series to be analyzed.
Table 1. General Information about the Corpus
| Publishers | Series | Year of Publication | Target Learners | Number of Units | Number of Passages |
| Hue University Publishing House, Express Publishing & DTP Education Solutions | Bright | 2022 | Grade 10 | 8 Units, 4 CC/CLIL/PCs* & 2 Reviews |
38 |
| VNUHCM Publishing House & Garnet Publishing | C21- Smart | 2020 | Grade 10 | 10 Units & 5 PCs | 28 |
| University of Education Publishers & Pearson | English Discovery | 2022 | Grade 10 | 9 Units & CLIL | 41 |
| Ho Chi Minh City University of Education Publishing House & National Geographic Learning | Explore New Worlds | 2022 | Grade 10 | 12 | 22 |
| Vietnam Education Publishing House & Oxford University Press | Friends Global | 2020 | Grade 10 | Introduction & 8 Units |
56 |
| Vietnam Education Publishing House & Pearson | Global Success | 2022 | Grade 10 | 10 Units & 4 Reviews | 39 |
| Hue University Publishing House & DTP Education Solutions | i-Learn Smart World | 2022 | Grade 10 | 10 Units & 4 Reviews | 23 |
| University of Education Publishers & Cambridge University Press | THiNK | 2021 | Grade 10 | Welcome, 8 Units & 4 Reviews |
40 |
*CC: Culture Corner; CLIL: Content and Language Integrated Learning; PC: Progress Check/ Source: author
Data Preparation
Firstly, the textbooks were downloaded as .pdf files from the publishers’ websites. Since this study centered on reading, only passages in reading sections were extracted and then converted into .txt files with the assistance of Capture2Text. This software helped scan the .pdf files so that the researcher need not retype the text. Spelling errors that resulted from the .txt to .pdf file conversion were corrected after a manual check of the whole corpus. Secondly, the .txt files were input into AntWordProfiler for preliminary analysis. The adopted reference list was the 25-level BNC/COCA in addition to the four supplementary lists of PNs, MWs, TCs, and acronyms. AntWordProfiler referred to BNC/COCA to classify words into their frequency levels and counted how many times they occur in texts. Words not included in the first 25 levels and the supplementary lists would be classified into “Not in the list”. Thirdly, based on the preliminary analysis, the researcher rechecked words that could not be read by the profiling program and replaced them with “Not in the list” to reflect their true frequency. After the adjustments, the corpus was processed in two ways corresponding to the two phases.
Data Analysis
Accordingly, Phase 1 profiled each full textbook to report their cumulative coverage while Phase 2 analyzed each unit in isolation to calculate the two constructs of lexical richness, known as lexical sophistication and lexical diversity. Because the sophistication of texts and the amount of mid- and low-frequency vocabulary they contained (words beyond 3,000) were directly proportional (Schmitt & Schmitt, 2014), the percentage of words in 4,000-25,000 levels functioned as the indicator of lexical sophistication:
For the calculation of lexical diversity, the three most reliable indices supported by Zenker & Kyle (2021) were calculated by TAALED 1.4.1, including MATTR50, HD-D, and MTLD Original. The correlations of text length with the three indices and with lexical sophistication were also measured to identify whether they were sensitive to length. Finally, Jamovi 2.3.28 was employed to compare the eight series in the three aspects of length, sophistication, and diversity.
Results and Discussion
Research Question 1
The results of Phase 1 serve to answer Research Question 1. Table 2 shows the cumulative coverage at each frequency level for the eight textbook series. The coverage of each textbook was presented in two columns to display the analysis results of two assumptions: (1) if 10th graders were aware of the supplementary lists and (2) if they were not. In case the students were unaware of the lists, they had to know at least 5,000-word families in the BNC/COCA list to gain 95% coverage because these lists accounted for roughly 4% of each textbook. This coverage for English Discovery, Explore New Worlds, and Friends Global was even impossible due to their absence. In case the students could recognize PNs, MWs, TCs, and acronyms, the amount of vocabulary needed decreased by half to 2,000-3,000 word families. For optimal comprehension (98%), the students must know 4,000-5,000 word families and the supplementary lists. Without knowledge of the lists, they would never reach this threshold
Table 2. Cumulative Coverage Without and With Proper Nouns, Marginal Words, Transparent Compounds, and Acronyms (%)
| Word List | Bright | C21 Smart | English Discovery | Explore New Worlds | Friends Global | Global Success | i-Learn Smart World | THiNK | ||||||||
| W/out | With | W/out | With | W/out | With | W/out | With | W/out | With | W/out | With | W/out | With | W/out | With | |
| 1,000 | 80.46 | 85.24 | 79.01 | 82.46 | 80.16 | 85.64 | 83.79 | 88.80 | 82.41 | 87.45 | 80.09 | 84.78 | 83.35 | 87.43 | 85.13 | 89.97 |
| 2,000 | 88.89 | 93.67 | 88.71 | 92.16 | 88.00 | 93.48 | 90.57 | 95.58 | 89.60 | 94.64 | 89.52 | 94.21 | 90.77 | 94.85 | 90.96 | 95.80 |
| 3,000 | 92.47 | 97.25 | 94.06 | 97.51 | 90.93 | 96.41 | 92.68 | 97.69 | 92.48 | 97.52 | 93.12 | 97.81 | 93.22 | 97.30 | 93.14 | 97.98 |
| 4,000 | 93.57 | 98.35 | 94.85 | 98.30 | 92.10 | 97.58 | 93.50 | 98.51 | 93.24 | 98.28 | 93.88 | 98.57 | 93.83 | 97.91 | 93.92 | 98.76 |
| 5,000 | 93.99 | 98.77 | 95.66 | 99.11 | 92.77 | 98.25 | 93.92 | 98.93 | 93.72 | 98.76 | 94.34 | 99.03 | 94.58 | 98.66 | 94.22 | 99.06 |
| 6,000 | 94.54 | 99.32 | 95.95 | 99.40 | 93.33 | 98.81 | 94.10 | 99.11 | 94.07 | 99.11 | 94.71 | 99.40 | 95.03 | 99.11 | 94.52 | 99.36 |
| 7,000 | 94.64 | 99.42 | 96.17 | 99.62 | 93.58 | 99.06 | 94.10 | 99.11 | 94.18 | 99.22 | 94.72 | 99.41 | 95.22 | 99.30 | 94.76 | 99.60 |
| 8,000 | 94.76 | 99.54 | 96.22 | 99.67 | 93.74 | 99.22 | 94.15 | 99.16 | 94.34 | 99.38 | 94.92 | 99.61 | 95.44 | 99.52 | 94.93 | 99.77 |
| 9,000 | 94.87 | 99.65 | 96.28 | 99.73 | 93.86 | 99.34 | 94.17 | 99.18 | 94.44 | 99.48 | 95.03 | 99.72 | 95.55 | 99.63 | 94.95 | 99.79 |
| 10,000 | 94.89 | 99.67 | 96.32 | 99.77 | 93.91 | 99.39 | 94.17 | 99.18 | 94.52 | 99.56 | 95.05 | 99.74 | 95.59 | 99.67 | 94.97 | 99.81 |
| 11,000 | 94.95 | 99.73 | 96.36 | 99.81 | 93.94 | 99.42 | 94.20 | 99.21 | 94.56 | 99.60 | 95.10 | 99.79 | 95.63 | 99.71 | 95.00 | 99.84 |
| 12,000 | 94.99 | 99.77 | 96.40 | 99.85 | 93.95 | 99.43 | 94.22 | 99.23 | 94.65 | 99.69 | 95.13 | 99.82 | 95.63 | 99.71 | 95.00 | 99.84 |
| 13,000 | 95.05 | 99.83 | 96.42 | 99.87 | 94.22 | 99.70 | 94.24 | 99.25 | 94.75 | 99.79 | 95.19 | 99.88 | 95.63 | 99.71 | 95.01 | 99.85 |
| 14,000 | 95.07 | 99.85 | 96.44 | 99.89 | 94.22 | 99.70 | 94.36 | 99.37 | 94.79 | 99.83 | 95.19 | 99.88 | 95.63 | 99.71 | 95.09 | 99.93 |
| 15,000 | 95.09 | 99.87 | 96.44 | 99.89 | 94.26 | 99.74 | 94.36 | 99.37 | 94.81 | 99.85 | 95.19 | 99.88 | 95.63 | 99.71 | 95.10 | 99.94 |
| 16,000 | 95.09 | 99.87 | 96.48 | 99.93 | 94.35 | 99.83 | 94.36 | 99.37 | 94.84 | 99.88 | 95.23 | 99.92 | 95.63 | 99.71 | 95.12 | 99.96 |
| 17,000 | 95.10 | 99.88 | 96.48 | 99.93 | 94.42 | 99.90 | 94.36 | 99.37 | 94.84 | 99.88 | 95.26 | 99.95 | 95.63 | 99.71 | 95.12 | 99.96 |
| 18,000 | 95.12 | 99.90 | 96.48 | 99.93 | 94.44 | 99.92 | 94.38 | 99.39 | 94.84 | 99.88 | 95.26 | 99.95 | 95.68 | 99.76 | 95.12 | 99.96 |
| 19,000 | 95.12 | 99.90 | 96.48 | 99.93 | 94.44 | 99.92 | 94.45 | 99.46 | 94.85 | 99.89 | 95.29 | 99.98 | 95.68 | 99.76 | 95.12 | 99.96 |
| 20,000 | 95.12 | 99.90 | 96.48 | 99.93 | 94.44 | 99.92 | 94.45 | 99.46 | 94.85 | 99.89 | 95.29 | 99.98 | 95.68 | 99.76 | 95.12 | 99.96 |
| 21,000 | 95.12 | 99.90 | 96.48 | 99.93 | 94.44 | 99.92 | 94.45 | 99.46 | 94.85 | 99.89 | 95.29 | 99.98 | 95.68 | 99.76 | 95.12 | 99.96 |
| 22,000 | 95.12 | 99.90 | 96.48 | 99.93 | 94.44 | 99.92 | 94.45 | 99.46 | 94.85 | 99.89 | 95.30 | 99.99 | 95.68 | 99.76 | 95.14 | 99.98 |
| 23,000 | 95.14 | 99.92 | 96.48 | 99.93 | 94.44 | 99.92 | 94.45 | 99.46 | 94.85 | 99.89 | 95.30 | 99.99 | 95.68 | 99.76 | 95.14 | 99.98 |
| 24,000 | 95.14 | 99.92 | 96.48 | 99.93 | 94.44 | 99.92 | 94.45 | 99.46 | 94.85 | 99.89 | 95.30 | 99.99 | 95.68 | 99.76 | 95.14 | 99.98 |
| 25,000 | 95.17 | 99.95 | 96.48 | 99.93 | 94.44 | 99.92 | 94.60 | 99.61 | 94.85 | 99.89 | 95.30 | 99.99 | 95.68 | 99.76 | 95.14 | 99.98 |
| Tokens | 6085 | 5165 | 6794 | 4274 | 10447 | 7107 | 4649 | 8839 | ||||||||
Note. The figures in bold indicate coverages reaching 95% and 98%
Figure 1 illustrates the three thresholds of 85%, 95%, and 98% in a graph for easier visualization. The coverage displayed in the figure took into account PNs, MWs, TCs, and acronyms.

Figure 1. The Amount of Vocabulary Needed to Achieve 85%, 95%, and 98% Coverage for the Readings of the Eight Textbook Series
Among the eight textbook series, C21 Smart and Global Success demanded more than 1,000 word families for 85% coverage while the rest were within the 1000-word level. Regarding 95% coverage, the lexical demands of the eight textbook series were relatively similar in that the number of word families needed for Explore New World and THiNK was 2,000, and the rest ranged from 2,000-3,000. When it comes to 98%, apart from i-Learn Smart World and English Discovery which were covered by nearly 5,000-word families in the BNC/COCA list, the remaining series 4,000 word families was sufficient.
Research Question 2
Table 3 shows the Pearson correlations of length to lexical sophistication and to lexical diversity indices. The correlation between length and sophistication was insignificant. Meanwhile, MTLD Original was observed to have the lowest correlation with text length among the three (r = 0.096, p = 0.353), suggesting that this would be the most reliable and precise index to measure the lexical diversity of the series.
Table 3. Correlations between Length, Lexical Sophistication, and Lexical Diversity Indices
| Sophistication | Diversity | ||||
| MATTR | HD-D | MTLD Original | |||
| Length | Pearson’s r | 0.121 | 0.127 | 0.391 | 0.096 |
| df | 93 | 93 | 93 | 93 | |
| p-value | 0.245 | 0.22 | <.001 | 0.353 | |
Table 4 presents the descriptive statistics of the eight series regarding length, sophistication, and diversity. The results of Skewness and Kurtosis indicated that the data were inconsistently normally distributed. Although the three variables of most of the textbooks were symmetric (within ±2) (Curran, West, & Finch, 1996), the Kurtosis of Length of English Discovery and Friends Global were found to be 5.35 and 4.183 in the given order, which showed departures from normality.
Table 4. Descriptive Statistics of the Readings in the Eight Textbook Series on Length, Sophistication, and Diversity
| Series | Variables | Mean | Median | Min | Max | SD | Skewness | Kurtosis |
| Bright | Length | 434.64 | 413.5 | 219 | 674 | 141.36 | 0.242 | -0.572 |
| Sophistication | 2.5 | 2.65 | 0.253 | 6.11 | 1.7 | 0.392 | -0.105 | |
| Diversity | 73.5 | 74.97 | 57.459 | 91.73 | 11.73 | -0.116 | -1.501 | |
| C21 Smart | Length | 516.5 | 552.5 | 176 | 775 | 209.12 | -0.31 | -1.15 |
| Sophistication | 2.18 | 2.41 | 0 | 4.11 | 1.46 | -0.215 | -1.49 | |
| Diversity | 71.2 | 68.18 | 44.3 | 114.72 | 20 | 0.968 | 1.57 | |
| English Discovery | Length | 679.4 | 658 | 475 | 1111 | 169.67 | 1.962 | 5.35 |
| Sophistication | 3.51 | 3.19 | 1.27 | 6.36 | 1.63 | 0.464 | -0.677 | |
| Diversity | 63.24 | 59.73 | 39.6 | 85.72 | 14.06 | 0.18 | -0.347 | |
| Explore New Worlds | Length | 356.17 | 256 | 213 | 802 | 189.87 | 1.546 | 1.559 |
| Sophistication | 1.99 | 1.7 | 0.382 | 4.08 | 1.15 | 0.676 | -0.311 | |
| Diversity | 59.52 | 63.31 | 34.179 | 69.3 | 10.16 | -1.552 | 2.592 | |
| Friends Global | Length | 1160.78 | 1263 | 246 | 1558 | 387.31 | -1.8498 | 4.183 |
| Sophistication | 2.28 | 2.03 | 0.755 | 3.79 | 0.854 | 0.0343 | 0.947 | |
| Diversity | 68 | 67.95 | 45.235 | 81.69 | 11.162 | -0.8944 | 1.087 | |
| Global Success | Length | 507.64 | 571 | 128 | 810 | 235.25 | -0.689 | -0.772 |
| Sophistication | 2.01 | 1.74 | 0 | 5.13 | 1.45 | 0.822 | 0.15 | |
| Diversity | 69.23 | 70.39 | 44.8 | 85.54 | 13 | -0.582 | -0.629 | |
| i-Learn Smart World | Length | 332.07 | 374.5 | 148 | 486 | 122.74 | -0.52 | -1.4578 |
| Sophistication | 2.55 | 2.94 | 0 | 5.3 | 1.45 | -0.189 | 0.052 | |
| Diversity | 65.56 | 66.07 | 39 | 99.8 | 16.39 | 0.264 | 0.1965 | |
| THiNK | Length | 736.58 | 763 | 188 | 1033 | 272.85 | -1.247 | 0.858 |
| Sophistication | 1.96 | 1.74 | 0.433 | 4.03 | 1.08 | 0.764 | -0.238 | |
| Diversity | 79.77 | 76.11 | 56.926 | 112.64 | 16.17 | 1.179 | 1.212 |
In line with the assumption that the data was not normally distributed, the results of the Shapiro-Wilk test in Table 5 witnessed the violation of normality, making Mean values unable to represent the central tendency of the data. Hence, the non-parametric Kruskal-Wallis test was utilized, which compared Median values instead of Mean. These Median values would be the lexical richness scores of the reading in the textbooks.
Table 5. Shapiro-Wilk Test of Normality
| W | p | |
| Length | 0.946 | <.001 |
| Sophistication | 0.985 | 0.354 |
| Diversity | 0.979 | 0.125 |
Although all the textbook series had significantly different lengths, there was no significant difference found in their sophistication and diversity. The ANOVA (Kruskal-Wallis) was significant for Length, x2 (7) = 39.35, p < .001. Meanwhile, the ANOVAs were insignificant for Sophistication, x2 (7) = 8.37, p = 0.301 and Diversity, x2 (7) = 13.05, p = 0.071. The results of the Dwass-Steel-Critchlow-Fligner post-hoc test are presented in Table 6.
Table 6. Pairwise Comparisons between the Readings of the Eight Textbook Series
| Series | Length | Sophistication | Diversity | ||||
| W | p | W | p | W | p | ||
| A | B | 1.4909 | 0.966 | -0.331 | 1.000 | -0.994 | 0.997 |
| A | C | 4.5554 | 0.028 | 1.905 | 0.881 | -2.567 | 0.610 |
| A | D | -2.037 | 0.839 | -1.091 | 0.995 | -3.564 | 0.187 |
| A | E | 4.4555 | 0.035 | -0.445 | 1.000 | -1.158 | 0.992 |
| A | F | 1.7549 | 0.920 | -1.105 | 0.994 | -0.845 | 0.999 |
| A | G | -2.1124 | 0.812 | -0.260 | 1.000 | -1.884 | 0.887 |
| A | H | 4.0013 | 0.088 | -1.164 | 0.992 | 0.364 | 1.000 |
| B | C | 2.0312 | 0.841 | 2.352 | 0.711 | -0.855 | 0.999 |
| B | D | -2.3786 | 0.699 | -0.233 | 1.000 | -2.052 | 0.833 |
| B | E | 4.1569 | 0.065 | 0.000 | 1.000 | 0.000 | 1.000 |
| B | F | -0.3313 | 1.000 | -0.456 | 1.000 | 0.166 | 1.000 |
| B | G | -3.1467 | 0.337 | 0.456 | 1.000 | -0.662 | 1.000 |
| B | H | 2.8908 | 0.452 | -0.280 | 1.000 | 1.212 | 0.990 |
| C | D | -4.3828 | 0.041 | -3.077 | 0.367 | -0.466 | 1.000 |
| C | E | 3.8105 | 0.124 | -2.425 | 0.678 | 1.155 | 0.992 |
| C | F | -2.1954 | 0.779 | -3.147 | 0.337 | 1.408 | 0.975 |
| C | G | -5.7137 | 0.001 | -1.905 | 0.881 | 0.497 | 1.000 |
| C | H | 2.3313 | 0.721 | -3.357 | 0.254 | 3.171 | 0.327 |
| D | E | 4.7237 | 0.019 | 1.106 | 0.994 | 2.412 | 0.684 |
| D | F | 2.1098 | 0.812 | -0.218 | 1.000 | 2.691 | 0.549 |
| D | G | -0.0727 | 1.000 | 1.637 | 0.944 | 1.237 | 0.988 |
| D | H | 3.6742 | 0.157 | -0.245 | 1.000 | 4.899 | 0.012 |
| E | F | -4.6337 | 0.023 | -1.604 | 0.949 | 0.178 | 1.000 |
| E | G | -4.7216 | 0.019 | 1.158 | 0.992 | -0.624 | 1.000 |
| E | H | -4.3217 | 0.046 | -1.508 | 0.964 | 2.412 | 0.684 |
| F | G | -3.0545 | 0.377 | 1.657 | 0.940 | -1.105 | 0.994 |
| F | H | 3.6376 | 0.166 | 0.218 | 1.000 | 1.455 | 0.970 |
| G | H | 4.5826 | 0.026 | -1.673 | 0.937 | 2.691 | 0.549 |
Note. A = Bright; B = C21 Smart; C = English Discovery; D = Explore New Worlds; E = Friends Global; F = Global Success; G = i-Learn Smart World; H = THiNK
As shown in Table 6, except for B (C21 Smart), whose reading passages were similar to those of the others in all three aspects, the remaining textbooks showed significant differences in passage length from one to three of their counterparts despite insignificantly different sophistication and diversity. Most notably, the reading passages of E (Friends Global) had significantly different lengths from those of the other five textbooks, listed as A (Bright), D (Explore New World), F (Global Success), G (i-Learn Smart World), and H (THiNK). This certainly matches Table 4.3 in that E contains the longest reading (Mdn = 1263, SD = 387.31). Besides, another worth-noting pair was D (Explore New World) and H (THiNK) whose difference was insignificant in length (W = 3.6742, p = 0.157) and sophistication (W = -0.245, p = 1.000), yet significant in diversity (W = 4.899, p = 0.012).
Discussion and Implications for EFL Teaching and Textbook Design
Generally, the lexical demands of the readings of the eight textbook series were equivalent at all the three thresholds of 85%, 95%, and 98%. On that account, all the textbook series share the same attributes. According to Nguyen (2020), the receptive vocabulary knowledge of Vietnamese upper-secondary students was at the 2,000 level. Therefore, with the vocabulary demand of around 1,000-2,000 word families plus the supplementary lists for 85% coverage, the eight textbook series can accommodate reading materials for language-focused instruction. In this sort of instruction, students will benefit from intensive reading and deliberate vocabulary learning (Nation, 2022). More specifically, intensive reading exposes students to language items and features to develop their linguistic knowledge while deliberate learning allows for “initial learning of large amounts of vocabulary in a very short time” (Nation, 2022, p. 201). These deliberately acquired vocabulary can be retained for longer and preserved for meaning-focused language reception and production. However, in consideration of Dang’s (2020) investigation into the receptive vocabulary knowledge of Vietnamese first-year non-English majors using the new VLT, half of the participants had not mastered 1,000 English words and only 10% reached 2,000 words. It can be inferred that these textbook series even challenge university students. For grade 10 students, whose proficiency level is commonly far lower than that of undergraduates, the series become much more impractical for them.
At the 95% threshold, Explore New World and THiNK demanded 2,000-word families plus the supplementary lists while the others required higher, but no more than 3,000. The vocabulary of 2,000-3,000 word families in the BNC/COCA has also been the MoET benchmark for Vietnamese upper-secondary English textbooks. Furthermore, Krashen (1981) suggested that language input should be slightly beyond students’ levels to inspire their learning. Thus, the eight textbook series are technically qualified for reading comprehension instruction. Nonetheless, the students’ vocabulary knowledge of 2,000 words and the MoET’s benchmark were applied upon completion of all grades in upper-secondary education. Given the fact that grade 10 is the first grade in this level, the vocabulary input for them should likely be much less than that (Le & Dinh, 2022). As discussed in the literature review, the actual level of 10th graders should be counted upon their completion of the preceding lower-secondary level, which is around 1,400-1,700 vocabulary items (Hoang, 2022). A possible explanation for this is that the upper-secondary level sets out to increase the vocabulary knowledge of students each year from grade 10 until they reach the expected outcome of 2,000-2,500 or 3,000 vocabulary items at the end of the level in grade 12. In consequence, these textbooks potentially require the students to increase 300-600 words or more to gain 95% coverage, making them more lexically demanding in reality.
Even though Nakayama (2021, 2022) has drawn on NGSL, his findings would be meaningfully comparable to those of the current study due to Nakayama’s (2021, 2022) and its shared focus on high-frequency words. Observing the cumulative coverage of the eight Vietnamese English textbooks, with high-frequency words in Level 2-3 in BNC/COCA, students can cover up to 92%-97% of the textbooks. This result coincides with Nakayama in that 92%-96% of Japanese-government English textbooks were high-frequency words, proving that both the eight Vietnamese EFL textbooks and the Japanese EFL textbooks contain predominantly high-frequency words.
At the 98% threshold, the students must know 4,000-word families plus the supplementary lists. That means they have to increase more than 2,000-word families for meaning-focused learning to happen. Apparently, acquiring such a large number of new words is too challenging since learners in EFL contexts can only absorb a maximum of 430 new words a year (Webb & Chang, 2012). This vocabulary level also exceeds MoET’s required standard of 2,000-3,000 word families and seems to be unnecessarily at this high. It is encouraging to compare these results with those of Nguyen (2020) and Le and Dinh (2022), which found that the preceding Grade 10 textbook necessitated 3,000 and 5,000-word families for 95% and 98% coverage. The eight new textbook series appear to contain fewer novel words, be more vocabulary-selective, and less hinder reading comprehension although they have not been optimized for unsupported reading.
In comparison with the English textbooks in Indonesia and China, the lexical demands of the textbook series in Vietnam were maintained at a lower level. So as to reach 95% and 98% coverage of the textbooks in Indonesia, students needed to be aware of 3,000-4,000 and 5,000-6,000 word families plus the supplementary lists (Rahmat & Coxhead, 2021). These numbers are 1,000-2,000 word families higher than those in Vietnam at each threshold. Compared to the vocabulary in the Chinese English textbooks analyzed by Sun and Dang (2020) and Yang and Coxhead (2022), the vocabulary in the eight textbook series in Vietnam tends not to be as significantly different at 98% as that in NCE Book 3 and Yilin. Both textbooks were 95% covered by 3,000-word families, but Yilin required students to know up to 9,000-word families plus the supplementary lists for 98%, which was nearly two times higher than its counterpart (Sun & Dang, 2020; Yang & Coxhead, 2022). It can be seen that the textbook series in Vietnam are much lower lexically demanding and better controlled than those from the other EFL countries.
Alongside the 25 main lists, the four supplementary lists are inseparable components that strongly impact the coverage of these textbooks. The profiles revealed that PNs made up the largest part of the supplementary lists of each textbook, including Vietnamese names (e.g., Thang, Long, Cheo, Tuoi, Tre, etc.), original names in other languages (e.g., Merida, Wezel, Downie, Curitiba, etc.), and brand names (e.g., Facebook, YouTube, Instagram, etc.). Following it were TCs (e.g., smartphone, homestay, webmail, etc.), acronyms (e.g., Covid, QR, UN, etc.), and MWs. Most of them were Vietnamese culture- and context-based words, words commonly used on social media, and compounds composed of easily recognizable single words. Despite covering a considerable proportion (more or less 4%), these words hardly obstruct comprehension as they are familiar to most Vietnamese students. Klassen (2022) has expressed concern that unfamiliar proper nouns could potentially lead to difficulties in reading comprehension. As can be seen, this concern is inapplicable to the current textbook series. This finding completely coincides with that of Le and Dinh (2022) on the preceding Grade-10 textbook and Rahmat and Coxhead (2020) on the textbooks in Indonesia. It is also consistent with Nation’s (2006) assumption that supplementary lists cause little vocabulary burden to readers. Moreover, the incorporation of familiar words in the culture and society of the target learners into foreign language textbooks has been highly valued because it helps simplify and localize the textbooks (Nguyen, Marlina, & Cao, 2021). Thus, students are encouraged to recall their background knowledge to read more effectively (Nation, 2022).
Furthermore, Phase 2 uncovered some lexical features of the reading passages in these eight textbook series. As can be seen in its findings, the textbooks contained reading passages with significantly different lengths, in which some passages were more than 1200-words long (i.e., Friends Global: Mdn = 1263, SD = 387.31) while some others were about 200-words long (e.g., Explore New World: Mdn = 256, SD = 189.87). As such, the impacts of lengthy texts on students’ reading comprehension also hold true for the present textbook series, which means those who encounter longer reading will probably face performance deficiencies.
In regard to lexical sophistication, the scores of the eight textbooks ranged from Mdn = 1.7 to Mdn = 3.19 and were not statistically significantly different. Since lexical sophistication indicates the amount of advanced vocabulary (Read, 2000), textbooks with insignificantly different scores will provide students with equal opportunities to sharpen their vocabulary and ensure equivalent proficiency outcomes. Though the previous studies did not clearly identify this construct, the percentage of words in the 4,000-25,000 levels in the textbook profiles they established could serve as reference points for the present research. Nguyen (2020) and Le and Dinh (2022) consistently figured out that these words accounted for 3.4% of the preceding Grade 10 textbook. Evidently, the sophistication of the new series has slightly decreased by 0.21 – 1.7 compared to their predecessor. It can be said that the eight new textbook series to some extent respond to the call for substituting lower-frequency words with those within the first 3,000 level (Le & Dinh, 2022; Nguyen, 2020).
Interestingly, the sophistication scores of the textbooks in Vietnam were, on median, equivalent to 2.74(%) of the Yilin textbooks in China (Sun & Dang, 2020). This figure aligns well with Nation’s (2022) statement that exposing students to a proper proportion of unknown words around 2% would prompt them to cultivate their vocabulary. The scores of the textbooks in Vietnam were also lower than those of the textbooks in Indonesia (Rahmat & Coxhead, 2021) and the NCE textbook in China (Yang & Coxhead, 2022), which were more than 4(%). It seems that the sophistication of the textbooks in Vietnam is more sensible because too many rare words, by contrast, will place a heavy learning burden on students (Nation, 2013).
Since vocabulary gets more sophisticated as it varies in a text (Daller et al., 2003), the equivalence in lexical sophistication signifies a great control over lexical diversity. The results unveiled that the diversity scores of the eight textbooks also had no statistically significant difference. It can be interpreted that these textbooks are able to expand the range of vocabulary of the students to the same degree. Taken together, the findings on the lexical sophistication and diversity of the eight textbooks further consolidate their facilitative role in the development of students’ vocabulary knowledge in both breadth and depth.
Based on these findings, this study offers some pedagogical implications for EFL instructors and textbook writers, who play a key role in simplifying the textbooks to make them more accessible. First, they ought to identify and filter out words that are seemingly new to students or those in lower-frequency levels by reference to the BNC/COCA list and profiling programs. After that, teachers can apply the pre-teaching approach, which means that they may teach these words in isolation prior to reading lessons (Le & Dinh, 2022; Nguyen, 2020; Webb & Nation, 2008; Webb, 2009). For textbook writers, they should replace these words with synonyms in high-frequency levels and design extra pre-teaching activities. Second, teachers should encourage their students to use dictionaries, preferably glossaries accompanying the textbooks, to promptly comprehend the meaning of unknown words for a higher coverage. To make it possible, textbook writers must provide insightful explanations that are intelligible to all textbook users.
Third, teachers can introduce students to graded readers, whose vocabulary is always controlled at high-frequency levels, to reinforce the students’ understanding of previously met words and promote meaning-focused output (Nation, 2022; Sun & Dang, 2020). Instructors are also advised to search for supplementary reading materials whose richness scores equate to those of the textbooks in this study so that students are exposed to a wider range of vocabulary for incidental learning. More importantly, teachers should shorten lengthy texts by summarizing their main content to reduce learning burdens on students. Last but not least, in future textbook compilation, textbook writers had better balance word counts among reading passages to minimize their impacts on students’ vocabulary development.
Conclusion
This study serves as an independent validation of multiple grade-10 English textbooks in Vietnam through the lens of lexical demands and richness in reading passages. The profiles of the textbooks indicated that for language-focused learning (85% coverage), students needed around 1,000-2,000 word families in the BNC/COCA list plus the supplementary lists; for reading comprehension (95% coverage), the number increased to 2,000-3,000 word families; and for meaning-focused learning (98% coverage), roughly 4,000-word families were necessary. This lexical demand applied to all the tested textbooks. Regarding lexical richness, the analysis revealed that even though the reading passages in these textbooks were significantly different in length, their scores for lexical sophistication and diversity were not significantly different. In conclusion, these eight textbook series are lexically equivalent and valid for random selection although they remain arduous and have yet to support students’ independent learning.
Besides the contributions discussed above, limitations of the current study are inevitable. Firstly, the research did not test the vocabulary knowledge of grade 10 students to compare with the lexical demands of the textbooks. As a consequence, it could not point out the exact number of word families that students need to increase to comprehend the textbooks from their actual level. Secondly, this study quantitatively analyzed reading passages as a sole research subject and excluded the Macmillan Move on series because the researchers were unable to find its copy. Thirdly, although the textbooks were aimed at CEFR B1, adopting Nation’s suggested vocabulary level for CEFR based on BNC/COCA may lead to variation in results as they might be developed from another wordlist. Last but not least, the word-counting unit of word family might have classified vocabulary into higher levels because counting both inflections and derivations as one unit would equate words with different parts of speech (Skoufaki & Petrić, 2021). This is also a subject of debate in lexical profiling studies.
Future studies can employ Webb et al.’s (2017) Updated Vocabulary Levels Test to address the first limitation. Considering the second limitation, Macmillan Move on and other qualitative and textbook components, such as listening audio, instructions, glossaries, exercises, and student or teacher feedback, should be taken into account for a more comprehensive view of grade 10 English textbooks in Vietnam. In response to the third limitation, this study should be revisited as soon as MoET issues a guideline on the specific wordlist that needs to be instructed to the students. To cope with the overestimation of words pointed out in the last limitation, future research can employ a wordlist counting lemma that distinguishes parts of speech. In addition, repetition of words in textbooks is conducive to incidental vocabulary learning (Matsuoka & Hirsh, 2010), thus, investigating how many times and across how many chapters a word is re-encountered in the textbook for students to consolidate their vocabulary will be a noteworthy research line. Also of research interest is whether the vocabulary of grade 11 and 12 textbooks increases incrementally compared to that of grade 10.
Acknowledgements
The authors would like to express their deepest gratitude to Mr. Hung Tan Ha, Victoria University of Wellington, New Zealand, for his guidance and tireless support during the research process. This research is funded by University of Economics Ho Chi Minh City, Vietnam.
About the Authors
Nam Nhat Lien obtained his bachelor’s degree in English Language at the University of Economics Ho Chi Minh City (UEH), Vietnam. He is currently working at the UEH International Languages and Country Studies Institute. His research interests include vocabulary studies and EFL learning and teaching. ORCID ID: 0009-0008-8179-715X
Nhi Hoa Mai is an English Language graduate of University of Economics Ho Chi Minh City (UEH), Vietnam. She has now been affiliated with the UEH International Languages and Country Studies Institute. Her research focuses on lexical analysis and language skills development. ORCID ID: 0009-0008-8868-4422
Nguyen Huynh Trang (Corresponding author) is an Associate Professor at the School of Foreign Languages, University of Economics Ho Chi Minh City, Vietnam. She completed her PhD in Linguistics at The English and Foreign Languages University, Hyderabad, India. Her research concerns loanwords, second language acquisition, language skills, and educational issues. ORCID ID: 0000-0002-6683-7028
To Cite this Article
Lien, N. N., Mai, N. H., & Trang, N. H. (2024). Vocabulary in English textbooks for Vietnamese upper-secondary students: A comparative analysis of reading passages. Teaching English as a Second Language Electronic Journal (TESL-EJ), 28(2). https://doi.org/10.55593/ej.28110a10
References
Ahmed, I. (2021). Lexical coverage in Bangladeshi EFL textbooks: A corpus-based study [Master’s thesis, Carleton University]. Carleton Institutional Repository. https://doi.org/10.22215/etd/2021-14594
Alsaif, A., & Milton, J. (2012). Vocabulary input from school textbooks as a potential contributor to the small vocabulary uptake gained by English as a Foreign Language learner in Saudi Arabia. Language Learning Journal, 40(1), 21-33. https://doi.org/10.1080/09571736.2012.658221
Andreassen, R., & Bråten, I. (2010). Examining the prediction of reading comprehension on different multiple-choice tests. Journal of Research in Reading, 33(3), 263-283. https://doi.org/10.1111/j.1467-9817.2009.01413.x
Anthony, L. (2023). AntWordProfiler (Version 2.1.0) [Computer Software]. Waseda University. https://www.laurenceanthony.net/software
Arnaud, P. J. L. (1984). The lexical richness of L2 written productions and the validity of vocabulary tests. In T. Culhane, C. Klein Braley & D. K. Stevenson (Eds.), Practice and Problems in Language Testing (pp. 14-28). University of Essex.
Aziez, F., & Aziez, F. (2018). The vocabulary input of Indonesia’s English textbooks and national examination texts for junior and senior high schools. TESOL International Journal, 13(3), 66-67.
Browne, C., Culligan, B., & Phillips, J. (2013). The new general service list. http://www.newgeneralservicelist.org
Chotlos, J. W. (1944). Studies in language behavior IV: A statistical and comparative analysis of individual written language samples. Psychological Monographs, 56(2), 77–111. https://doi.org/10.1037/h0093511
Council of Europe (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press. http://www.coe.int/t/dg4/education/elp/elpreg/Source/Key_reference/CEFR_EN.pdf
Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The Moving-Average Type-Token Ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100. https://doi.org/10.1080/09296171003643098
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and specification error inconfirmatory factor analysis. Psychological Methods, 1(1), 16–29. https://doi.org/10.1037/1082-989X.1.1.16
Daller, H., van Hout, R., & Treffers‐Daller, J. (2003). Lexical richness in the spontaneous speech of bilinguals. Applied Linguistics, 24(2), 197–222. https://doi.org/10.1093/applin/24.2.197
Dang, T. N. Y. (2020). Vietnamese non-English major EFL university students’ receptive knowledge of the most frequent English words. VNU Journal Of Foreign Studies, 36(3). https://doi.org/10.25073/2525-2445/vnufs.4553
Dang, T. C. T., & Seals, C. (2016). An evaluation of primary English textbooks in Vietnam: A sociolinguistic perspective. TESOL Journal, 9(1), 93-113. https://doi.org/10.1002/tesj.309
Dang, T. N. Y., & Webb, S. (2016). Evaluating lists of high-frequency words. ITL International Journal of Applied Linguistics, 167(2), 132–158. https://doi.org/10.1075/itl.167.2.02dan
Dang, T. N. Y., Webb, S., & Coxhead, A. (2020). Evaluating lists of high-frequency words: teachers’ and learners’ perspectives. Language Teaching Research, 26(4), 617-641. https://doi.org/10.1177/1362168820911189
Ellis, N. C. (2002). Reflections on frequency effects in language processing. Studies in Second Language Acquisition, 24(2), 297-339. https://doi.org/10.1017/S0272263102002140
Forrin, N. D., Mills, C., D’Mello, S. K., Risko, E. F., Smilek, D., & Seli, P. (2021). TL; DR: longer sections of text increase rates of unintentional mind-wandering. The Journal of Experimental Education, 89(2), 278–290. https://doi.org/10.1080/00220973.2020.1751578
Guiraud, P. (1960). Problèmes et méthodes de la statistique linguistique [Problems and methods of linguistic statistics]. Presses universitaires de France.
Ha, H. T. (2022a). Vocabulary demands of informal spoken English revisited: what does it take to understand movies, TV programs, and soap operas? Frontiers in Psychology, 13(1), 1-7. https://doi.org/10.3389/fpsyg.2022.831684
Ha, H. T. (2022b). Lexical profile of newspapers revisited: A corpus-based analysis. Frontiers in Psychology, 13(1), 1-10. https://doi.org/10.3389/fpsyg.2022.800983
Ha, H. T., Le, H. T., & Phung, D. H. (2022). Is “general” easier than “academic”? A corpus-based investigation into the two modules of IELTS reading test. SN Social Sciences, 2, Article 159. https://doi.org/10.1007/s43545-022-00461-1
Häcker, M. (2008). Eleven pets and 20 ways to express one’s opinion: the vocabulary learners of German acquire at English secondary schools. The Language Learning Journal, 36(2), 215-266. https://doi.org/10.1080/09571730802393183
Hashimoto, B.J., & Egbert, J. (2019). More than frequency? Exploring predictors of word difficulty for second language learners. Language Learning, 69, 839-872. https://doi.org/10.1111/lang.12353
Herdan, G. (1960). Type-token mathematics: A textbook of mathematical linguistics. De Gruyter Mouton.
Hess, C. W., Haug, H., & Landry, R. G. (1989). The reliability of type-token ratios for the oral language of school age children. Journal of Speech and Hearing Research, 32(3), 536–540. https://doi.org/10.1044/jshr.3203.536
Hess, C. W., Sefton, K. M., & Landry, R. G. (1986). Sample size and type-token ratios for oral language of preschool children. Journal of Speech and Hearing Research, 29(1), 129–134. https://doi.org/10.1044/jshr.2901.129
Hoang, V. V. (2022). Interpreting MOET’S 2018 General Education English Curriculum. VNU Journal of Foreign Studies, 38(5). https://doi.org/10.25073/2525-2445/vnufs.4866
Hsu, W. (2011). The vocabulary thresholds of business textbooks and business research articles for EFL learners. English for Specific Purposes, 30(14), 247–257. https://doi.org/10.1016/j.esp.2011.04.005
Hsu, W. (2018). The most frequent BNC/COCA mid- and low-frequency word families in English-medium traditional Chinese medicine (TCM) textbooks. English for Specific Purposes, 51, 98–110. https://doi.org/10.1016/j.esp.2018.04.001
Hsu, W. (2022) To what extent may EFL undergraduates with EMI develop English vocabulary? The case of civil engineering. LEARN Journal, 15(1), 469–494. https://so04.tci-thaijo.org/index.php/LEARN/article/view/256732
Hu, M., & Nation, I. S. P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign Language, 13(1), 403–430. https://doi.org/10.26686/wgtn.12560354.v1
Johnson, W. (1944). Studies in language behavior I: A program of research. Psychological Monographs, 56(2), 1–15. https://psycnet.apa.org/doi/10.1037/h0093508
Klassen, K. (2022). Proper name theory and implications for second language reading. Language Teaching, 55(2), 149-155. https://doi.org/10.1017/S026144482100015X
Krashen, S. D. (1981). Second language acquisition and second language learning. Pergamon Press.
Kyle, K., Crossley, S. A., & Jarvis, S. (2021). Assessing the validity of lexical diversity using direct judgements. Language Assessment Quarterly, 18(2), 154-170. https://doi.org/10.1080/15434303.2020.1844205
Laufer, B. (1989). What percentage of text lexis is essential for comprehension? In C. Lauren & M. Nordman (Eds.), Special Language: From Humans Thinking to Thinking Machines (pp. 316–323). Multilingual Matters.
Laufer, B. (1995). Beyond 2000: A measure of productive lexicon in a second language. In L. Eubank, L. Selinker & E. Sharwood Smith (Eds.), The Current State of Interlanguage. Studies in Honor of William E. Rutherford (pp. 265-272). John Benjamins.
Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Lexical threshold revisited: Lexical text coverage, learners’ vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15–30. https://doi.org/10125/66648
Le, N. H., & Ha, H. T. (2023). Lexical demands of academic written English: From students’ assignments to scholarly publications. Sage Open, 13(4), 1-16. https://doi.org/10.1177/21582440231216292
Le, N. T. M., & Dinh , H. T. (2022). Vocabulary coverage in a high school Vietnamese EFL textbook: a corpus-based preliminary investigation. Vietnam Journal of Education, 6(2), 102–113. https://doi.org/10.52296/vje.2022.187
Linnarud, M. (1986). Lexis in composition: a performance analysis of Swedish learners’ written English (Lund studies in English). CWK Gleerup.
Malvern, D. D., & Richards, B. J. (1997). A new measure of lexical diversity. In A. Ryan & A. Wray (Eds.), Evolving models of language (pp. 58–71). Multilingual Matters.
Matsuoka, W., & Hirsh, D. (2010). Vocabulary learning through reading: Does an ELT course book provide good opportunities? Reading in a Foreign Language, 22(1), 56-70. http://nflrc.hawaii.edu/rfl
McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual lexical diversity (MTLD) [Doctoral dissertation, University of Memphis]. The University of Memphis ProQuest Dissertations Publishing. https://www.aaai.org/ocs/index.php/FLAIRS/2010/paper/view/1283
McCarthy, P. M., & Jarvis, S. (2007). Vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459–488. https://doi.org/10.1177/0265532207080767
McLean, S. (2021). The coverage comprehension model, its importance to pedagogy and research, and threats to the validity with which it is operationalized. Reading in a Foreign Language, 33(1), 126–140. https://doi.org/10125/67396
Mesmer, H. A., & Hiebert, E. H. (2015). Third graders’ reading proficiency reading texts varying in complexity and length: responses of students in an urban, high-needs school. Journal of Literacy Research, 47(4), 473–504. https://doi.org/10.1177/1086296X16631923
Milton, J. (2009). Measuring second language vocabulary acquisition (Vol. 45). Multilingual Matters.
Ministry of Education and Training. (2018, December 26). Chương trình giáo dục phổ thông: Chương trình môn tiếng Anh [General Education English Curriculum] (Circular No. 32/2018/TT-BGDĐT).
Nakayama, S. (2021). A quantitative analysis of vocabulary taught in Japanese EFL textbooks. Research Square. https://doi.org/10.21203/rs.3.rs-688772/v1
Nakayama, S. (2022). A close examination of vocabulary in Japanese EFL textbooks. JALT Post-conference Publication, 2021(1), 209-216. https://doi.org/10.37546/jaltpcp2021-24
Nation, I. S. P. (n.d.). Vocabulary, the CEFR levels, and word family size. https://www.wgtn.ac.nz/lals/resources/paul-nations-resources/vocabulary-lists
Nation, I. S. P. (2006). How large a vocabulary is needed to reading and listening? The Canadian Modern Language Review, 63(1), 59–82. https://doi.org/10.3138/cmlr.63.1.59
Nation, P. (2007). The four strands. International Journal of Innovation in Language Learning and Teaching, 1(1), 2–13. https://doi.org/10.2167/illt039.0
Nation, I. S. P. (2012). The BNC/COCA word family lists. http://www.victoria.ac.nz/lals/about/staff/paul-nation
Nation, I. S. P. (2017). The BNC/COCA Level 6 word family lists (Version 1.0.0) [Data file]. http://www.victoria.ac.nz/lals/staff/paul-nation.aspx
Nation, I. S. P. (2022). Learning vocabulary in another language (3rd ed., Cambridge Applied Linguistics). Cambridge University Press. https://doi.org/10.1017/9781009093873
Ngo, T. H. T., & Luu, Q. K. (2018). A review on grammar teaching of English textbook entitled “Skillful: Listening and speaking, student’s book pack 2”. Social Science and Humanities Journal, 2(12), 764-775.
Nguyen, T. T. M. (2007). Textbook evaluation: The case of English textbooks currently in use in Vietnam’s upper-secondary schools College of Foreign. Unpublished research report. RELC SEAMEO.
Nguyen, T. T. M. (2011). Learning to communicate in a globalized world: to what extent do school textbooks facilitate the development of intercultural pragmatic competence? RELC Journal, 42(1), 17–30. https://doi.org/10.1177/0033688210390265
Nguyen, T. C. (2015). An evaluation of the textbook English 6: a case study from secondary schools in the Mekong Delta provinces of Vietnam [Unpublished doctoral thesis, The University of Sheffield]. White Rose eTheses Online. https://etheses.whiterose.ac.uk/10033/
Nguyen, C.-D. (2020). Lexical features of reading passages in English-language textbooks for Vietnamese high-school students: do they foster both content and vocabulary gain? RELC Journal, 52(3), 509–522. https://doi.org/10.1177/0033688219895045
Nguyen, T. T. M, Marlina, R., & Cao, T. H. P. (2021). How well do ELT textbooks prepare students to use English in global contexts? An evaluation of the Vietnamese English textbooks from an English as an International Language (EIL) perspective. Asian Englishes, 23(2), 184-200. https://doi.org/10.1080/13488678.2020.1717794
Nu, T. A. T. (2018). Pragmatic input in newly-published national English textbooks for Vietnamese students [Master’s thesis, Macquarie University]. Macquarie University Research Data Repository. https://doi.org/10.25949/19433474.v1
Nurmukhamedov, U., & Sharakhimov, S. (2023). Corpus-based vocabulary analysis of English podcasts. RELC Journal, 54(1), 7-21. https://doi.org/10.1177/0033688220979315
Rahmat, Y. N. & Coxhead, A. (2021). Investigating vocabulary coverage and load in an Indonesian EFL textbook series. Indonesian Journal of Applied Linguistics, 10(3), 804-814. https://doi.org/10.17509/ijal.v10i3.31768
Read, J. (2000). Assessing vocabulary. Cambridge University Press.
Richards, B. J. (1987). Type/token Ratio: What do they really tell us? Journal of Child Language, 14(2), 201-209. https://doi.org/10.1017/S0305000900012885
Richards, J. (2005). The role of textbooks in a language program. https://www.professorjackrichards.com/wp-content/uploads/role-of-textbooks.pdf
Schmitt, N., Cobb, T., Horst, M., & Schmitt, D. (2017). How much vocabulary is needed to use English? Replication of van Zeeland & Schmitt (2012), Nation (2006) and Cobb (2007). Language Teaching, 50(2), 212-226. https://doi.org/10.1017/S0261444815000075
Schmitt, N., Jiang, X., & Grabe W. (2011). The percentage of words known in a text and reading comprehension. The Modern Language Journal, 95(1), 26–43. https://doi.org/10.1111/j.1540-4781.2011.01146.x
Schmitt, N., & Schmitt, D. (2014). A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Language Teaching, 47(4), 484-503. https://doi.org/10.1017/S0261444812000018
Sheldon, L. (1988). Evaluating ELT textbooks and materials. ELT Journal, 42(4), 237-246.
Skoufaki, S., & Petrić, B. (2021). Academic vocabulary in an EAP course: Opportunities for incidental learning from printed teaching materials developed in-house. English for Specific Purposes, 63, 71-85. https://doi.org/10.1016/j.esp.2021.03.002
Stoeckel, T., McLean, S., & Nation, P. (2020). Limitations of size and levels tests of written receptive vocabulary knowledge. Studies in Second Language Acquisition, 43(1), 181–203. https://doi.org/10.1017/S027226312000025X
Sun, Y., & Dang, T. N. Y. (2020). Vocabulary in high-school EFL textbooks: Texts and learner knowledge. System, 93, Article 102279. https://doi.org/10.1016/j.system.2020.102279
The jamovi project (2023). jamovi (Version 2.4) [Computer Software]. https://www.jamovi.org
Ton Nu, A. T., & Murray, J. (2020). Pragmatic content in EFL textbooks: An investigation into Vietnamese national teaching materials. TESL-EJ, 24(3), 1-28. https://tesl-ej.org/pdf/ej95/a8.pdf
Trang, N. H., Nguyen, D. T. B., & Ha, H. T. (2023). Vocabulary demands of academic spoken English revisited: A case of university lectures and ted presentations. SAGE Open, 13(1). https://doi.org/10.1177/21582440231155334
van Hout, R., & Vermeer, A. (1988). Spontane taaldata en het meten van lexicale rijkdom in tweede-taalverwerving [Spontaneous language data and the measurement of lexical richness in second language acquisition]. Toegepaste Taalwetenshap in artikelen, 32(1), 108-122. https://doi.org/10.1075/ttwia.32.07hou
Vitta, J. P., Nicklin, C., & Albright, S. W. (2023). Academic word difficulty and multidimensional lexical sophistication: An English-for-academic-purposes-focused conceptual replication of Hashimoto and Egbert (2019). The Modern Language Journal, 107(1), 373–397. https://doi.org/10.1111/modl.12835
Vu, D. V., & Peters, E. (2021). Vocabulary in English language learning, teaching, and testing in Vietnam: A review. Education Sciences, 11(9), 563. https://doi.org/10.3390/educsci11090563
Webb, S. (2009). The effects of pre-learning vocabulary on reading comprehension and writing. The Canadian Modern Language Review, 65(3), 441-470. http://dx.doi.org/10.3138/cmlr.65.3.441
Webb, S. A., & Chang, A. C. S. (2012). Second language vocabulary growth. RELC Journal, 43(1), 113-126. https://doi.org/10.1177/0033688212439367
Webb, S., & Nation, P. (2008). Evaluating the vocabulary load of written text. TESOLANZ Journal, 16, 1-9. https://doi.org/10.26686/wgtn.12552152.v1
Webb, S., & Nation, I. S. P. (2013). Teaching vocabulary. In C. Chappelle (Ed.), Encyclopedia of applied linguistics (pp. 5670–5677). Wiley-Blackwell.
Webb, S., & Nation, P. (2017). How vocabulary is learned. Oxford University Press.
Webb, S., & Rodgers, M. P. H. (2009a). The lexical coverage of movies. Applied Linguistics, 30(3), 407–427. https://doi.org/10.1093/applin/amp010
Webb, S., & Rodgers, M. P. H. (2009b). Vocabulary demands of television programs. Language Learning, 59(2), 235–366. https://doi.org/10.1111/j.1467-9922.2009.00509.x
Webb, S., Sasao, Y., & Ballance, O. (2017) The updated vocabulary levels test: Developing and validating the VLT. ITL – International Journal of Applied Linguistics, 168(1), 33–69. https://doi.org/10.1075/itl.168.1.02web
Yang, L., & Coxhead, A. (2022). A corpus-based study of vocabulary in the New Concept English textbook series. RELC Journal, 53(3), 597–611. https://doi.org/10.1177/0033688220964162
Youngblood, A. M., & Folse, K. S. (2017). Survey of corpus-based vocabulary lists for TESOL classes. MEXTESOL Journal, 41(1), 1–15.
Zenker, F., & Kyle, K. (2021). Investigating minimum text lengths for lexical diversity indices. Assessing Writing, 47, Article 100505. https://doi.org/10.1016/j.asw.2020.100505
| Copyright of articles rests with the authors. Please cite TESL-EJ appropriately. Editor’s Note: The HTML version contains no page numbers. Please use the PDF version of this article for citations. |

