May 2026 – Volume 30, Number 1
https://doi.org/10.55593/ej.30117a3
Maria Arjie T. Domingo
Bulacan Agricultural State College, Philippines
<mariaarjie_domingo
basc.edu.ph>
Abstract
Plagiarism remains a persistent ethical and academic issue despite the widespread use of automated detection tools such as Turnitin. Students continue to develop ways to hide plagiarism through linguistic changes to circumvent the detection mechanisms of the software, making it difficult for educators to identify these acts of dishonesty. This study employed a linguistic approach to examine instances of undetected plagiarism in postgraduate academic papers from two universities in the Philippines. Anchored in Celce-Murcia and Larsen-Freeman’s (1999) grammatical metalanguage and Sousa-Silva’s (2014) framework of plagiarism deception strategies, the study analyzed thirty papers that scored at least ten percent similarity in Turnitin reports. All unflagged sections of each paper were manually cross-referenced with identified online sources and examined at the subsentential, sentential, and suprasentential levels to determine how linguistic modifications disguise copied material. Results revealed that the most frequent alteration type was subsentential manipulation (86.94%), particularly through word insertion and substitution, followed by word reordering and paraphrasing. These linguistic modifications successfully disrupted Turnitin’s string-matching algorithms, which made the possible plagiarized content be undetectable. The findings demonstrate the limits of extrinsic plagiarism detection systems and highlight the necessity of integrating linguistic analysis into plagiarism review procedures. The study recommends developing hybrid detection frameworks combining computational and forensic linguistic methods to enhance academic integrity monitoring in higher education institutions.
Keywords: linguistic analysis, plagiarism detection, forensic linguistics, academic integrity, Turnitin
Plagiarism, Technology, and Academic Integrity
Academic integrity is one of the most consistent ethical challenges facing higher education today. In spite of greater awareness of academic integrity among students, many students continue to appropriate other people’s work using paraphrasing and word-level alterations. The increase in the availability of digital resources makes scholarly works easier to locate, but also allows for more opportunities to plagiarize. As Coulthard and Johnson (2007) noted, the digital era has fostered an environment where information can be replicated with minimal effort, increasing the incidence of what has been termed digital plagiarism.
However, plagiarism has evolved beyond verbatim copying. Ahmed and Anirvan (2020) argue that plagiarism includes any unacknowledged use of another’s intellectual product—text, ideas, graphics, or methods—and should be viewed as both a linguistic and ethical issue. Since plagiarism encompasses a wide range of unacknowledged uses of others’ ideas, many detection programs are unable to identify plagiarism that occurs at the level of linguistic disguise. Linguistic disguise changes the surface form of the material while preserving the author’s ideas and intellectual property.
Automated plagiarism detection programs, including Turnitin and Grammarly, are increasingly being used by educators and researchers to help ensure the academic integrity of their students’ assignments. Many of these detection programs use algorithms that measure the similarity of a student’s assignment to a database of previously written and/or publicly available materials. However, these detection systems are generally less successful in detecting plagiarism that occurs through linguistic manipulation, such as using synonyms for words, reorganizing phrases, or paraphrasing material without properly citing the source (Ratna et al., 2017; Sousa-Silva, 2014a). Zimba and Gasparyan (2021) further caution that similarity scores alone do not prove plagiarism, since detection software measures lexical overlap rather than semantic or conceptual equivalence. Consequently, many forms of linguistic manipulation remain undetected, particularly when writers skillfully disguise borrowed ideas through variation in vocabulary and syntax.
Due to the competitive environment of producing publications within the Philippine higher education system, there is an increased emphasis on the quantity of research output required by institutions and mandated by the government. At times, this environment creates an atmosphere in which the quality of the research is sacrificed for the sake of meeting the quantity requirement. This environment can lead to unethical writing practices, and the consequences of plagiarism go beyond just academic dishonesty, as plagiarism is considered a crime under the Philippine Intellectual Property Code (R.A. 8293), and violating the code can result in legal violation. The ethical and legal implications of plagiarism therefore extend beyond academic misconduct and into intellectual property rights (Bailey, 2012).
Recent studies have identified a global surge in academic dishonesty during and after the COVID-19 pandemic, attributed to increased online access and reduced supervision (Eshet, 2023; Sozon et al., 2024). Eshet (2023) described this surge in plagiarism as a “plagiarism pandemic,” and stated that the number of plagiarism cases increased significantly during emergency remote teaching. Likewise, Sozon et al. (2024) found that, despite the development of multiple technologies to detect plagiarism, plagiarism continues to develop in complexity and scope, thus emphasizing that technology is insufficient to address the issue of plagiarism.
Despite numerous institutional initiatives, a significant gap persists in understanding how linguistic strategies enable plagiarism to reduce detection. Much of the current literature has focused on developing technology to address the issue of plagiarism, whereas very little of the existing literature has addressed how linguistic analysis can assist in the detection of plagiarism. As Sousa-Silva (2015) noted, plagiarism is fundamentally a linguistic issue before it becomes a technological issue, as plagiarism involves the choice of vocabulary, structure, and discourse that influence meaning. Therefore, to better understand how to improve the ability of existing detection systems to identify plagiarism, there needs to be a greater emphasis placed on linguistic analysis to analyze how writers change the form of the writing while still maintaining the same semantic meaning.
This study therefore investigates undetected plagiarism in postgraduate academic papers through a linguistic framework. It focuses on how language-level manipulations—at the word, phrase, and sentence levels—conceal lifted ideas from digital detection systems. This study aims to illustrate how a linguistic approach can complement computational tools in identifying concealed plagiarism and promoting more comprehensive academic integrity practices in Philippine universities.
Conceptual Background
Plagiarism is often defined as the appropriation of another person’s ideas or expressions without proper acknowledgment (Swales & Feak, 2012). While this definition captures the ethical aspect of plagiarism, it fails to account for the linguistic and cognitive processes involved in how texts are reproduced. Pennycook (1996) argued that plagiarism is an extremely complex process that is affected by many factors including culture, language and pedagogy. In fact, in some educational settings, reproducing authoritative texts is seen as a way of showing respect to an intellectual tradition (Adam et al., 2016).
In second-language academic writing, plagiarism can also emerge from limited linguistic competence or developmental “patchwriting” (Howard, 1999; Pecorari, 2010). When writers attempt to paraphrase they sometimes unwittingly replicate the original sentence structure due to their limited English vocabulary or grammar. The result is that patchwriting confuses what is a deliberate act of deception and what is an innocent act of ignorance. Therefore, linguistic analysis is crucial in differentiating purposeful manipulation of text from novice attempts to rewrite text.
Perkins et al. (2020) have shown that providing second language writers with targeted education about academic misconduct can significantly reduce instances of plagiarism. They conducted a longitudinal study that indicated that plagiarism was reduced by over one third after students completed formal training in academic integrity. Similarly, Robles et al. (2020) found that students often plagiarize due to time constraints, lack of motivation, or insufficient understanding of citation and paraphrasing conventions. These studies indicate that linguistic misuse often results from both cognitive limitations and institutional pressures, reinforcing the importance of instruction that integrates language awareness with ethical reasoning.
The primary approaches to plagiarism detection systems include extrinsic and intrinsic detection (Potthast et al., 2009). Extrinsic plagiarism detection involves comparing a target document to a large collection of documents to determine if there are similarities; whereas intrinsic plagiarism detection looks for inconsistencies within the writing itself that would suggest the possibility of multiple authors (Meyer zu Eissen & Stein, 2006). While the former is computationally efficient, it fails to capture paraphrasing or idea theft; the latter requires linguistic interpretation and stylistic analysis. Together, these methods illustrate why human linguistic knowledge remains indispensable for plagiarism detection.
Sousa-Silva (2014a) and Alzahrani and Salim (2010), have also pointed out that current extrinsic systems are ill-equipped to deal with translingual plagiarism — those instances in which the ideas are translated into another language or reworded. This illustrates the need for a linguistically-based system of plagiarism detection that includes a semantically-based analysis of the content of the text, a syntactically-based analysis of the text, and a discursively-based analysis of the text.
Operational Definition of Plagiarism in This Study
In this study, plagiarism is defined as the unacknowledged appropriation of another author’s intellectual content i.e., author’s ideas, argumentation structure, or propositional content without giving credit to the original author, regardless of whether the original wording was retained or modified through paraphrasing or other linguistic changes.
A passage was considered plagiarism when it demonstrated three attributes: (a) substantive semantic equivalence to its source, (b) failure to provide appropriate citation(s) to the source material, and (c) evidence of linguistic manipulation in order to transform the surface form of the original but preserve the meaning of the original. This model of plagiarism places emphasis upon the meaning conveyed by a passage, rather than the similarity of words used to convey that meaning.
The definition also does not assume that plagiarism occurred due to the author’s intention. The authors’ intentions are irrelevant to the analysis of plagiarism as a result of this study’s focus upon the textual evidence of intertextual appropriation, which represents the authors’ use of others’ work, and not upon why the authors chose to appropriate others’ work (Howard, 1999; Pecorari, 2008).
The definition above provides the theoretical framework for the methods of linguistically coding plagiarism, and as such will enable a consistent analysis of the passages based upon intellectual property rights, semantic consistency, and linguistic modification, rather than simply surface similarity.
Review of Related Studies
The development of plagiarism detection technology has been shaped by developments in both Natural Language Processing (NLP) and Computational Linguistics. The early plagiarism detection systems used technologies including Stein and Meyer zu Eissen’s (2006) substring-matching, keyword-similarity and fingerprint analysis. More recent plagiarism detection systems (Bensalem et al., 2014; Naseem & Kurian, 2013), have machine learning employed semantic based techniques (AlSallal et al., 2016), and Latent Semantic Analysis (Ratna et al., 2017). Although these methods improved accuracy, they remained primarily dependent on string similarity rather than linguistic meaning.
Plagiarism Detection Systems in the past (Stein & Meyer zu Eissen, 2006) and present (Bensalem et al., 2014) are mostly based on keyword matching and fingerprinting that provide little insight into linguistic meaning. Recent plagiarism detection systems employing semantic and machine learning techniques (AlSallal et al., 2016; Ratna et al., 2017) were still largely algorithmic rather than pedagogically oriented.
In the field of TESL and English for Academic Purposes (EAP), researchers have underscored that plagiarism must also be treated as a pedagogical concern linked to academic literacy and language learning. Howard (1999), Pecorari (2010), and Perkins et al. (2020) highlighted the importance of teaching, providing feedback and raising awareness in order to prevent unintentional plagiarism and improve students’ paraphrasing abilities. Furthermore, Robles, Rivas, and Campos (2020) demonstrated that linguistic competence and moral education can help to reduce misconduct by illustrating how the structure of language is connected to authorship responsibilities. In contrast to purely forensic views of plagiarism, these TESL oriented perspectives view plagiarism as not just a technical violation, but also as an educational opportunity to develop linguistic awareness and academic honesty.
In addition to the above viewpoints, Sousa-Silva (2015) illustrated that even very small grammatical and/or lexical changes are sufficient to hide plagiarism, while Mulla (2014) found that changes in modality, tense and coherence could be used as indicators of “idea plagiarism.” Further, Bretag and Mahmud (2009) proposed a hybrid approach that would combine automated plagiarism detection with manual evaluation of content and intent because machines alone cannot evaluate the intent behind a piece of written work.
Zimba and Gasparyan (2021) provided a practical guide for researchers on how to prevent plagiarism in their own research, focusing on preventive actions such as awareness training, mentoring and supporting institutions in establishing structures and policies to support plagiarism-free research. Zimba and Gasparyan’s recommendations reinforce the suggestions of Perkins et al. (2020) and Robles et al. (2020) regarding the value of educating researchers and students about plagiarism, the need to promote linguistic awareness and the value of education as the best defense against plagiarism.
In the Philippines there are relatively few studies examining plagiarism from a linguistic standpoint. Roman (2018) studied plagiarism among students enrolled in a Bachelor of Elementary Education program at a state university and showed that after using Turnitin, the plagiarism rates decreased dramatically in students’ research writings. However, Roman’s study provided empirical data on the efficacy of plagiarism detection software in reducing plagiarism, and highlighted the influence of institutional contexts on students’ participation in plagiarism, Roman (2018) did not examine the linguistic characteristics of potential plagiarized texts nor identify the linguistic techniques used by students to plagiarize in their writing.
The majority of the literature in the Philippines (e.g. Debuque et al. 2023; Macatangay, 2015; Resurreccion, 2012) focuses on the attitudes of students toward plagiarism, the extent of plagiarism among students and institutional issues surrounding plagiarism rather than the linguistic features of plagiarism themselves. The gap in the literature highlights the need for linguistically-informed studies that illustrate how plagiarism functions in actual academic writing and how linguistic manipulation may interact with the limitations of plagiarism detection technologies.
Theoretical Framework
The study is grounded in two complementary frameworks: Celce-Murcia and Larsen-Freeman’s (1999) grammatical metalanguage, and Sousa-Silva’s (2014) framework of plagiarism deception strategies.
Celce-Murcia and Larsen-Freeman’s (1999) Grammatical Metalanguage
This theoretical framework presents a definition of grammar as a system of form, meaning, and use; it focuses on how the various components of a language are used within the context of discourse. It provides the analytical lens through which textual alterations are categorized at three levels:
- Subsentential (word and phrase level) – alterations such as insertion, substitution, or morphological changes;
- Sentential (sentence structure level) – reordering, clause combination, or syntactic restructuring;
- Suprasentential (discourse level) – paragraph or thematic reformulation preserving overall semantic intent.
This hierarchy enables systematic identification of textual manipulations that affect grammatical structure and meaning without changing propositional content.
Sousa-Silva’s (2014) Plagiarism Deception Strategies
Sousa-Silva proposed several strategies that plagiarists employ to mislead detection systems, including but not limited to: word substitution, reordering, translation, and paraphrasing. These strategies are linguistic rather than purely technical, relying on the writer’s ability to manipulate language to disguise authorship. Integrating this model with grammatical metalanguage allows the current study to classify the linguistic operations that underlie undetected plagiarism and understand how these operations interact with plagiarism detection algorithms.
Taken together, the two frameworks create a common base of knowledge from which to understand how plagiarism is achieved at different levels of the organization of a text and why plagiarism may be able to evade detection by algorithms. The integration of computational and linguistic approaches represents the study’s goal of approaching plagiarism as a linguistic phenomenon rather than solely as a technological one.
Research Gap and Purpose of the Study
Despite numerous advancements in automated plagiarism detection, studies integrating linguistic and forensic perspectives remain limited. While there has been extensive focus placed on developing improved algorithms for detecting plagiarism, there has been much less emphasis on studying the linguistic mechanisms that allow plagiarism to remain undetected. In the Philippine academic context, this gap is especially notable, as institutional reliance on Turnitin reports often substitutes for qualitative evaluation of student writing.
Existing literature indicates that although external detection methods are effective for identifying literal copying, they often fail to identify disguised plagiarism that includes structural and semantic transformations. Therefore, a linguistic perspective may assist in explaining how writers consciously or unconsciously modify texts to preserve the underlying meaning while eliminating any textual traces of the source material.
This study therefore seeks to:
- Identify the forms of linguistic alteration present in potential plagiarized passages within postgraduate academic papers; and
- Describe the linguistic strategies through which such alterations reduce the detectability of intertextual borrowing by automated plagiarism detection systems.
Through these objectives, the study contributes to a more comprehensive understanding of academic plagiarism by demonstrating how linguistic manipulation, rather than surface copying alone, shapes the limits of digital detection. It further advocates for the integration of forensic linguistic analyses in the assessment processes of quality assurance in higher education as a means to emphasize the importance of ethics and pedagogy in the development of research skills in students.
Methodology
Research Design
This study employed a descriptive linguistic content analysis to examine instances of plagiarism that were undetected by extrinsic plagiarism-detection systems. The design integrates both qualitative and quantitative descriptive methods, where qualitative analysis focused on linguistic manipulation of potential plagiarized passages, and quantitative analysis summarized the frequency and distribution of alteration types.
The methodology for the study was based upon the premise that plagiarism is a linguistic act; i.e., it represents a deliberate reorganization of a particular set of language structures that are used to conceal the borrowing of texts from other sources. Building on Sousa-Silva’s (2014b) claim that a linguistic examination of plagiarism complements computational detection, the study systematically identified, classified, and analyzed the different linguistic changes made by postgraduate students to conceal plagiarism.
Research Context and Data Source
The study was conducted within the context of Philippine higher education, where postgraduate students are required to produce academic papers as part of their coursework and capstone requirements. Two universities in Metro Manila—one public and one private—served as the study sites to represent institutional diversity.
A total of thirty (30) academic papers were purposively selected based on the following criteria:
- Written by postgraduate students majoring in English or Applied Linguistics;
- Submitted between Academic Years 2015–2020;
- Contained a Turnitin similarity score of 10% or higher; and
- Comprised between 3,000 to 8,000 words.
This time frame establishes an early baseline in terms of academic integrity practices prior the COVID-19 pandemic disrupted both the academic community and the maturation of automated detection technologies. This corpus size was deemed sufficient to allow for qualitative analysis of language patterns while maintaining manageability for detailed manual comparison.
To ensure ethical compliance, permission was obtained from the respective deans and professors to access anonymized copies of student papers. Each paper was used with the informed consent of the original author, who acknowledged that their work would be analyzed confidentially for research purposes only.
Analytical Framework
The analytical process followed a two-tiered framework based on:
- Celce-Murcia and Larsen-Freeman’s (1999) model of grammatical metalanguage, which provides a structure for analyzing linguistic forms and meanings at subsentential, sentential, and suprasentential levels; and
- Sousa-Silva’s (2014) model of strategies for misleading detection systems, which identifies linguistic manipulations such as word substitution, insertion, reordering, and paraphrasing.
Data Collection Procedure
In order to develop a systematic approach to identify and analyze plagiarism cases throughout the different stages of this study, the study proceeded in four distinct phases.
Phase 1: Screening of Documents
To maintain data integrity, researchers confirmed that no selected papers had been previously uploaded to detection systems. Consequently, any detected similarities reflect original submission history rather than research-related exposure. All thirty papers were uploaded to Turnitin, an extrinsic plagiarism detection system. The generated originality reports were reviewed, and passages flagged as unhighlighted but suspicious—that is, sections not detected by the software yet potentially plagiarized—were identified. Only those sections that (a) lacked proper citation and (b) showed conceptual resemblance to existing online texts were extracted for manual analysis.
Phase 2: Identification of Source Texts
Each suspicious passage was compared against the top two matched sources identified by Turnitin. These matched sources were verified and retrieved from publicly available academic databases or open-access journal websites. All extracted data were stored in labeled folders (e.g., Postgraduate Paper 01 – Source 1/Source 2) for systematic tracking.
Phase 3: Manual Comparison and Annotation
The extracted passages were subjected to manual side-by-side comparison with their probable sources. The analysis focused on structural and lexical modifications, including:
- Word-level changes (synonymy, antonymy, insertion, deletion, affixation);
- Sentence-level restructuring (reordering, nominalization, clause reconfiguration);
- Discourse-level reformulation (idea sequencing, thematic rephrasing).
Each modification was visually highlighted and annotated regarding the level of language in which it was performed (i.e., subsentential, sentential, suprasentential) along with a note regarding how the modification affected the similarity and equivalence of the source material and suspected plagiarism at the surface and conceptual levels.
Phase 4: Quantitative Coding
Following the qualitative annotation, instances of alteration were coded and tallied according to type and frequency. Frequencies were then converted into percentages to provide an overview of the distribution of alteration forms across the corpus. The purpose of this relatively basic descriptive statistical analysis was to determine which alteration types occurred most frequently among the studied texts.
Units and Levels of Analysis
The analysis operated across three linguistic levels, consistent with Celce-Murcia and Larsen-Freeman’s (1999) framework:
- Subsentential level – involves individual words, morphemes, and short phrases (e.g., substitution of synonyms, insertion of modifiers, or tense changes).
- Sentential level – focused on grammatical and syntactic structure (e.g., clause transposition, voice alternation, and reordering of constituents).
- Suprasentential level – addressed paragraph or discourse-level reformulations (e.g., restructuring ideas or integrating copied material within original context).
These levels facilitated a systematic classification of textual manipulations that modify the linguistic surface but preserve semantic equivalence—precisely the type of alteration most likely to escape detection by extrinsic systems.
Coding and Reliability Procedures
Two independent coders with postgraduate backgrounds in applied linguistics were recruited to verify the researcher’s analysis. They were trained to identify potential plagiarism using the coding categories described above. Each coder independently analyzed forty percent of the data set to evaluate the consistency of classification.
Interrater reliability was calculated using Krippendorff’s Alpha (α), a robust coefficient suitable for nominal data (Krippendorff, 2011). Agreement was computed based on the coders’ classifications of alteration types (subsentential, sentential, suprasentential) and strategy categories (word insertion, substitution, reordering, paraphrasing). The computed reliability coefficient was α = 0.83, indicating high consistency and reliability in the coding process. Discrepancies were resolved through discussion until full consensus was reached.
Ethical Considerations
Ethical clearance for the study was granted by the university ethics board. All procedures followed ethical research practices for confidentiality, consent, and academic integrity. No participant identifier information was kept, and all students provided written informed consent electronically. Students were notified that their participation in the study would have no bearing on their academic status.
Additionally, all source materials were restricted to publicly available academic literature so that no unauthorized data would be used. The study was undertaken solely for scholarly purposes and did not disseminate any student writing outside of the confines of this research.
Data Analysis Techniques
The data analysis combined qualitative linguistic description and quantitative summarization as follows:
- Linguistic Analysis: Each passage was analyzed for forms of textual alteration. Descriptions emphasized grammatical, lexical, and semantic shifts that altered expression without significantly changing meaning. Illustrative examples were extracted to demonstrate how these manipulations operate at each linguistic level.
- Frequency Analysis: The frequency and percentage of each alteration type were computed using the formula:

This computation provided quantitative support for identifying dominant manipulation types.
- Interpretive Analysis: The findings were then interpreted in relation to Sousa-Silva’s (2014b) list of deceptive strategies and existing literature on Turnitin’s algorithmic limitations. This stage provided the basis for discussing how and why linguistic modifications allow potentially plagiarized passages to remain undetected.
Trustworthiness of the Study
To ensure the trustworthiness of findings, the study employed several qualitative validation strategies:
- Triangulation: Analytical results were validated through multiple data sources (Turnitin reports, original sources, and manual textual comparisons).
- Peer Debriefing: Two colleagues specializing in forensic linguistics reviewed the preliminary coding results and offered feedback on category refinement.
- Audit Trail: All coded materials, annotations, and frequency tables were archived systematically to enable replication or secondary review.
- Transparency: Tables and extracts in the Results and Discussion section explicitly present original and modified passages for verification of linguistic claims.
Results and Discussion
Overview of Findings
The analysis of thirty postgraduate academic papers revealed extensive use of linguistic manipulation as a means of concealing plagiarism. Across the corpus, a total of 1,076 alteration instances were identified and classified according to their linguistic level and strategy type.
The data shows that the majority of the linguistic manipulations occurred at the sub-sentential level (86.94%); sentential-level manipulations (10.50%); and lastly, the manipulations occurring at the super-sentential level (2.56%). The high occurrence rate of sub-sentential manipulations demonstrates that most plagiarizers utilized micro-level linguistic manipulation (i.e., small, but strategically important, modifications to surface form) to modify borrowed texts to maintain semantic equivalence while altering surface forms.
These findings are consistent with previous research which describes plagiarism as more than just verbatim copying of texts — it is possible to borrow the intellectual content of another writer(s), even though the surface form has been modified (Ahmed & Anirvan, 2020). Similarly, Sousa-Silva (2014b) states that strategic textual disguise of plagiarized works occurs largely through localized linguistic manipulations that “break” string similarity while maintaining propositional meaning thereby evading detection from algorithm-based comparison systems such as Turnitin.
Distribution of Alteration Types
The table below summarizes the distribution of linguistic manipulations observed in the data.
Table 1. Distribution of Linguistic Alterations by Level and Strategy
| Linguistic Level | Type of Alteration | Frequency | Percentage (%) |
| Subsentential | Word insertion and substitution | 462 | 42.93 |
| Word reordering | 189 | 17.56 | |
| Morphological alteration (affixation, tense change) | 153 | 14.21 | |
| Lexical paraphrasing | 130 | 12.24 | |
| Sentential | Clause reordering / restructuring | 81 | 7.53 |
| Nominalization / voice alternation | 32 | 2.97 |
This data illustrates that word insertion, substitution and reordering of words were the primary ways writers manipulated borrowed text to avoid being identified as plagiarists. Because there was less use of manipulation at both sentence and supra-sentence levels, it appears that most writers wished to minimize the degree of disruption to the flow of the writing while keeping the amount of re-writing to a minimum.
This pattern reflects a pragmatic intent—to preserve the integrity of the borrowed content while evading detection—aligning with Sousa-Silva’s (2014a) description of plagiarism as a “linguistically economical” act. More recent studies, such as those by Ahmed and Anirvan (2020) and Zimba and Gasparyan (2021), have further reinforced this view by noting that plagiarism today is characterized by small-scale linguistic disguises rather than outright copying.
Subsentential-Level Alterations
Manipulation at the subsentence level was the most common and successful method of evading plagiarism detection systems (e.g., Turnitin) which primarily utilize string matching algorithms (Zimba & Gasparyan, 2021). Writers’ use of manipulation at the subsentence level included the use of lexical items (insertion, substitution, etc.) that altered surface level expressions while leaving the semantic meaning of the original intact.
Word Insertion and Substitution
Word insertion and substitution involved replacing or adding lexical items to the original text that altered the syntactic structure of the text but left the overall semantic meaning intact.
Source Text (excerpt)
Another example of blended research design is a study by Van Beuningen, De Jong, and Kuiken (2012) which examined the effects of comprehensive, direct, and indirect teacher feedback on students’ accurate use of lexical, grammatical and orthographic errors in revised and new writing (in 4 weeks). Results revealed that both direct and indirect feedback lead to improved accuracy in revised and new writing. Also, error type played a role. While direct feedback improved grammatical accuracy in new essays, indirect feedback improved non-grammatical accuracy.
Student Version (Postgraduate Paper #14)
Another example of a blended research design is the study by Van Beuningen, De Jong, and Kuiken (2012), which looked at the effects of comprehensive, direct, and indirect teacher feedback on students’ accurate use of lexical, grammatical, and orthographic features in both revised and new writing over four weeks. The results showed that both direct and indirect feedback led to improved accuracy in students’ revised drafts and new essays. In addition, the type of error mattered. Direct feedback was more effective for improving grammatical accuracy in new essays, while indirect feedback was more helpful for non-grammatical accuracy.
In this example, the student merged and paraphrased a short paragraph from a journal article, altering lexical items (targeted → specific; argumentation → argument structure; lasting → sustained) and inserting modifiers (corrective guidance) to mask the original text. The student’s additions, therefore, disrupted Turnitin’s algorithms’ search for contiguous strings of characters, but left the original ideas virtually unchanged. As Ahmed & Anirvan (2020) have observed, these superficial changes on the surface of the texts hide a possible intellectual theft by creating a false sense of originality through linguistic variety.
Morphological Alteration
Morphological manipulations include changing verb tense, number, or adding affixes—further disguise copying while maintaining propositional content.
Source Text
Making errors is one of the most unavoidable things in the world. Students in the process of learning language profit from errors that they make by obtaining feedback to make new attempts that successively approximate their desired object.
Student Version (PP #7)
They must have committed errors in the process of developing competence in ESL. Specifically, they often make mistakes when writing essays in English. Thus, learner errors have become overwhelming and are even considered an indispensable phenomenon in language learning.
Here, making errors becomes committed errors and unavoidable becomes indispensable. Although these changes in morphology and vocabulary do not produce overlapping strings they do preserve the meaning. These small changes are examples of what Sousa-Silva (2014b) calls “lexically fractured equivalence,” which can be used to counter an algorithm’s ability to match the words.
Word and Clause Reordering
Source Text (excerpt)
Peer assessment also promotes collaborative interaction and deep knowledge construction among students through the dialogue process of assessment and feedback. Interactive behaviors such as arguing and questioning during mutual assessment can also promote the development of students’ reflection and critical thinking… Peer assessment based on assessment scaffolding can also improve assessment consistency and rubric quality and increase learning effectiveness and learner recognition.
Student Version (Postgraduate Paper #18)
When learners engage in structured peer assessment, they become more involved in reflective dialogue and collaborative meaning-making. Through continuous exchange of comments, students question one another’s ideas and gradually refine their critical thinking skills. To make this process effective, teachers need to provide clear rubrics and assessment scaffolds so that students can align their judgments and give more consistent feedback. As a result, peer assessment not only supports shared understanding but also strengthens learning quality and learner confidence.
The following example illustrates a multiple sentence copy where each of the sentences have had their order changed or been rewritten using synonyms; however, the original conceptual order and paragraph organization remains the same. This transformation does more than simply substitute one sentence for another; it is an example of a paragraph-level mosaic which takes the original information structure and redistributes it, but breaks the chain of detectable lexical connections.
Nominalization and Voice Alternation
A smaller portion of manipulations involved converting verbs to nouns (examine → examination) or shifting between active and passive constructions.
Source Text
Researchers examined how teacher feedback improves student writing.
Student Version
The study conducted an examination of the ways teacher feedback contributes to improvement in student writing.
These transformations create syntactical padding and transform grammatical categories without changing the meaning. According to Sousa-Silva (2014b), this type of semantic paraphrase preserves propositional content while misleading the detection tools which rely on superficial similarity.
Suprasentential-Level Alterations
At the discourse level, some writers reorganized entire paragraphs, blending copied and original sentences.
Source Text
Studies at the K–12 level have shown that teacher feedback plays a significant role in improving students’ writing skills… teacher feedback plays a crucial role in enhancing students’ writing skills, offering revision opportunities that help them grow into proficient writers.
Student Version
Teacher feedback functions as a central driver in the development of students’ writing competence throughout their schooling. Rather than simply correcting errors, feedback creates spaces for learners to reconsider their texts, negotiate meaning, and refine their language choices over time. As students repeatedly engage with comments and revisions, they gradually move toward greater precision, clarity, and confidence in their writing.
Here, thematic focus and logical progression are retained, but expansions and insertions obscure traceability. Such patchwriting (Howard, 1999) merges copied and original material, demonstrating how plagiarism can function at a structural, multi-sentence level.
From a pedagogical perspective, recognizing these paragraph-scale manipulations is vital. They reveal how postgraduate writers possess enough linguistic awareness to reconstruct surface form while keeping borrowed logic and argument flow intact—a reminder that plagiarism detection must integrate linguistic judgment rather than rely solely on similarity percentages.
Collectively, these examples demonstrate that plagiarism in academic writing operates not only at the lexical or sentential level but across extended discourse units. By adding paragraph-level mosaics and showing how meaning is preserved beneath reworded surfaces, the analysis clarifies why automated tools often miss the most ethically problematic cases. For TESL and EAP instructors, these patterns provide authentic materials for classroom discussion on paraphrasing, authorship, and responsible textual reuse.
Mechanisms of Evasion in Plagiarism Detection Systems
The patterns observed confirm the limitations of software-based extrinsic detection systems. As noted in Zimba & Gasparyan (2021), many systems capture lexical overlap but are unable to detect whether plagiarism has occurred in the sense of idea appropriation and transformation.
Through linguistic manipulation, such as substituting words or phrases; changing the order of the same words and phrases; or changing one or more parts of speech to name one example; the contiguous string of words are broken down into lexically dislocated equivalencies, which Sousa-Silva (2014b) describes as “equivalence” as well as “equivocality”. Consequently, the software recognizes the manipulated text as being originally written, regardless of the fact that it was derived from another source.
Figure 1 conceptualizes the relationship between linguistic manipulation and detection failure.

Figure 1. Linguistic Manipulation and Detection Evasion
This process illustrates that plagiarism detection failure is not a purely computational issue but a linguistic phenomenon rooted in how textual form can diverge from meaning. Hence, machine tools alone cannot ensure academic integrity without human interpretation.
This conceptualization resonates with Ahmed & Anirvan (2020), who emphasize that plagiarism is not limited to word-for-word copying but includes paraphrasing, mosaicking, and idea appropriation.
Additionally, the pandemic environment created a micro-environment of remote learning that greatly exacerbated the potential for academic dishonesty. Eshet’s (2023) large scale study of plagiarism across institutions revealed that plagiarism rates were substantially greater during the pandemic. In an environment of reduced monitoring and increased availability of online resources, students may perceive both opportunity and decreased deterrent, increasing the need for effective disguises, such as those observed in this research study.
Interpretive Discussion
The findings collectively indicate that postgraduate writers demonstrate a functional awareness of how plagiarism detection systems operate. The observed linguistic manipulations reflect a form of strategic textual behavior, which may arise from a range of factors, including limited linguistic resources, instructional gaps, or attempts to improve textual originality, or developmental writing practices rather than from deliberate intent. This understanding is consistent with Bretag and Mahmud’s (2009) claim that technological literacy may lead to more sophisticated forms of plagiarism due to an increase in user familiarity with the plagiarism detection technology.
From a linguistic standpoint, the manipulation patterns reveal an interplay of form, meaning, and use as conceptualized by Celce-Murcia and Larsen-Freeman (1999). Writers modify form (syntax, morphology) in ways that reduce textual traceability while preserving meaning (semantic content) and use (communicative function within academic discourse). This triadic relationship demonstrates that plagiarism cannot be fully understood or addressed without understanding language as a system of communicative choices.
At a broader level, the results support Sousa-Silva’s (2015) characterization of plagiarism as a linguistic practice of textual disguise, realized through grammatical and semantic transformation. Such transformation reflects linguistic awareness and textual competence, reinforcing the view that plagiarism is not solely an ethical issue, but also a linguistic issue that requires both pedagogical approaches and institutional regulation to address.
Therefore, relying solely on similarity scores is insufficient. The findings support the recommendation made by Zimba and Gasparyan (2021) for combining human linguistic judgment with algorithmic screening of plagiarism detection systems. Hybrid plagiarism detection frameworks, which combine automated similarity reports with linguistic analysis offer a more balanced and accurate approach to evaluating textual originality.
Limitations and Directions for Future research
While this study offers significant linguistic insights, several limitations constrain its generalizability. First, the small corpus of 30 postgraduate papers from two universities limits the representativeness of the findings. Second, since the study only looked at the English language documents, the study has no ability to evaluate translingual plagiarism. Third, while the study used descriptive coding to identify manipulations, it is unlikely to be able to accurately determine what was in a writer’s mind or the cognitive processes involved in writing the potential plagiarized material. Finally, this study prioritized qualitative interpretation over the computational implementation of the proposed detection framework.
Despite these constraints, this exploratory investigation highlights the value of linguistic insight into undetected plagiarism. As such, the study provides a basis for future studies that will explore ways to combine linguistics and computer-based approaches to detecting plagiarism. Also, future studies should expand the dataset across diverse disciplines, incorporate multilingual contexts, and utilize student interviews to capture the motives behind plagiarism (cf. Robles et al., 2020). Ultimately, there is a need to develop new hybrid plagiarism-detection systems that rely on both linguistically-based feature-analysis techniques and computationally-intensive approaches.
Pedagogical and Institutional Implications
The findings have significant implications for academic writing instruction and institutional policy:
Linguistically Informed Plagiarism Training. Academic writing pedagogy should involve more than merely teaching citation rules to teach awareness of language-based manipulation of texts. Perkins et al. (2020) indicate that targeted academic-integrity education, which includes education on plagiarism detection, can significantly decrease plagiarism occurrence. Explicit instruction on paraphrasing, summarizing, and authorial voice can help reduce unintended plagiarism and dissuade students from using strategic textual disguise.
Human–Machine Complementarity. Educators should treat Turnitin and similar tools as supportive aids, not definitive arbiters of what constitutes plagiarism. By incorporating linguistic analysis in the evaluation process of plagiarism review, educators can ensure fairness and contextual judgments.
Forensic Linguistics Integration. Institutions can adopt a forensic linguistic protocol for complex cases where Turnitin results are inconclusive. This analysis would examine the lexical, grammatical, and discourse patterns of a text to assess authorship authenticity of produced the work.
Policy Development. As Robles et al. (2020) suggest, academic integrity frameworks must address the social and linguistic causes of plagiarism—time pressure, lack of motivation, and linguistic insecurity—by promoting formative feedback and awareness. Academic integrity policies must acknowledge that plagiarism can manifest linguistically, not solely ethically. These policies should incorporate options for linguistic review and remediation, emphasizing student learning over punishment.
Synthesis of Contribution
Overall, the study confirms that linguistic manipulation is the primary mechanism through which plagiarism escapes detection in postgraduate academic writing. Subsentential-level alterations, particularly word insertion, substitution, and reordering, are the most prevalent and effective forms of strategic textual disguise.
These manipulations exploit the limitations of algorithmic systems that evaluate lexical similarity rather than semantic equivalence. Therefore, computer-based detection must be supplemented with linguistic scrutiny in order to establish a comprehensive assessment of textual originality.
These findings suggest that effective plagiarism prevention in higher education requires more than enhanced detection technologies, but also continued linguistic instruction that fosters students’ awareness of authorship, paraphrasing, and textual accountability.
This study contributes to the literature by (a) providing empirical evidence of the forms and frequencies of linguistic manipulation in postgraduate texts within the Philippine context; (b) demonstrating the ways in which these manipulations challenge detection systems; and (c) linking these practices to the larger integrity and pedagogical issues identified in the literature (e.g., Ahmed & Anirvan, 2020; Eshet, 2023; Zimba & Gasparyan, 2021). The specific illustrations above make it more concrete how plagiarism is implemented at the linguistic levels.
Summary
This study was designed to investigate how postgraduate students in selected universities in the Philippines used linguistic manipulation to disguise potential plagiarized text so as to avoid being detected by automated systems like Turnitin. Through a descriptive linguistic content analysis of thirty academic papers, the study assessed the linguistic levels and methods of textual modification of plagiarized passages that were not detected by the automated system. Anchored in the grammatical metalanguage of Celce-Murcia and Larsen-Freeman (1999), and the deception strategy framework of Sousa-Silva (2014), the study identified and explained the linguistic operations that obscured the copied material while maintaining its semantic content.
Results revealed that the overwhelming majority of manipulations occurred at the subsentential level (86.94%), primarily through word insertion, substitution, reordering, and paraphrasing. Sentential-level modifications (10.5%)—such as clause restructuring and nominalization—were less frequent but reflected more systematic and cognitively complex attempts at concealment. Suprasentential-level manipulations (2.56%) appeared least often and typically involved integrating borrowed ideas into original paragraphs.
The findings support the conclusion that linguistic transformation—is not conceptually synthesis—is the main method of disguise in postgraduate writing. Detection tools that depend on string matching algorithms cannot detect the subtle manipulations because they transform surface forms without transforming the propositions. Thus, plagiarism continues to exist not simply due to technological inadequacies, but due to students’ lack of linguistic awareness when assessing the originality of their texts.
Conclusion
This study concludes that linguistic analysis provides crucial insight into the mechanisms through which plagiarism operates at both microstructural and macrostructural levels of academic writing. The findings demonstrate that subtle forms of linguistic manipulation—such as lexical substitution, reordering, and paraphrasing—can significantly limit the effectiveness of algorithm-based detection tools, highlighting of taking a multi-faceted approach to the problem that includes linguistic methods, technical methods, and instructional methods.
By framing plagiarism as both an ethical and linguistic issue, this study helps build the international discussion of academic integrity, particularly within TESL and English for Academic Purposes (EAP) contexts. While the sample data came from higher education in the Philippines, the linguistic mechanisms found in this study reflect problems experienced by educators, writing specialists, and educational institutions around the world. The results therefore underscore the importance of adopting linguistically-informed detection practices and instructional models in order to encourage genuine authorship, enhance ethical scholarship, and encourage the development of responsible academic authors in global higher education.
Implications for TESL and EAP Practice
The study’s findings offer actionable insights for TESL and EAP educators in terms of developing plagiarism-awareness activities that combine linguistic analysis with training in ethical authorship. By explicitly teaching students how form interacts with meaning and use when paraphrasing and citing sources, instructors can provide students with a metalinguistic understanding of textual ownership. Such pedagogical integration not only mitigates plagiarism but also strengthens learners’ authorial identity, critical literacy, and communicative competence within academic discourse communities.
Recommendations
Universities should consider incorporating linguistic awareness into their academic writing courses and university-wide policies to address plagiarism in disguise more effectively. ESL/EFL writing teachers may benefit from providing explicit training on paraphrasing, patchwriting, and ethical source use, using detection software as a formative (as opposed to punitive) tool (Perkins et al., 2020). Institutions are also encouraged to explore hybrid detection frameworks that combine algorithmic screening with human linguistic review, since surface-level similarity scores may not fully capture conceptual borrowing (Sozon et al., 2024; Zimba & Gasparyan, 2021). Finally, institutional academic integrity policies could be revised to include linguistic manipulation—such as word substitution, reordering, or morphological change—as a potential form of plagiarism (Ahmed & Anirvan, 2020). Moreover, enhancing mentorship and supervision practices may help strengthen ethical authorship, minimize unintentional plagiarism, and foster a culture of integrity in Philippine higher education.
About the Authors
Maria Arjie T. Domingo is a faculty member at Bulacan Agricultural State College in San Ildefonso, Bulacan, and a lecturer at the University of Santo Tomas, Manila. She holds a PhD in English. Her research interests include AI-assisted writing, linguistic approaches to plagiarism detection, academic integrity, discourse analysis, corpus linguistics, and ESL academic writing. ORCID ID: 0009-0008-9568-472X
To Cite this Article
Domingo, M. A. T. (2026). A linguistic analysis of undetected plagiarism in postgraduate academic papers Teaching English as a Second Language Electronic (TESL-EJ), 30(1). https://doi.org/10.55593/ej.30117a3
References
Adam, L., Anderson, V., & Sproken-Smith, R. (2016). “It’s not fair”: Policy discourses and students’ understandings of plagiarism in a New Zealand university. Higher Education, 72(1), 17–32. https://doi.org/10.1007/s10734-016-0025-9
Ahmed, S., & Anirvan, P. (2020). The true meaning of plagiarism. Indian Journal of Rheumatology, 15(3), 155–158. https://doi.org/10.4103/injr.injr_178_20
AlSallal, M., Iqbal, R., Amin, S., James, A., & Palade, V. (2016). An integrated machine learning approach for extrinsic plagiarism detection. In Conference: 2016 9th International Conference on Developments in eSystems Engineering (DeSE) (pp. 203–208). https://doi.org/10.1109/dese.2016.1
Alzahrani, S., & Salim, N. (2010). Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Lab Report for PAN at CLEF. https://ceur-ws.org/Vol-1176/CLEF2010wn-PAN-AlzahraniEt2010.pdf
Bailey, J. (2012, November 19). Criminalizing plagiarism in the Philippines. iThenticate Blog. https://www.scribd.com/document/491675777/Article-Criminalizing-Plagiarism-in-the-Philippines
Bensalem, I., Rosso, P., & Chikhi, S. (2014). Intrinsic plagiarism detection using n-gram classes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1459–1464). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1153
Bretag, T., & Mahmud, S. (2009). A model for determining student plagiarism: Electronic detection and academic judgment. Journal of University Teaching & Learning Practice, 6(1), 49–60. https://www.scirp.org/reference/referencespapers?referenceid=1035330
Celce-Murcia, M., & Larsen-Freeman, D. (1999). The grammar book (2nd ed.). Heinle & Heinle.
Coulthard, M., & Johnson, A. (2007). An introduction to forensic linguistics: Language in evidence. Routledge.
Debuque, M. B. G., Dofitas, J. B. A., Espia, D. A. P. P., Ferrariz, T. C. R., Gargarita, F. J. P., & Oducado, R. M. F. (2023). Factors influencing intention to plagiarize among nursing students in the Philippines. Belitung nursing journal, 9(2), 118–123. https://doi.org/10.33546/bnj.2555
Eshet, Y. (2023). The plagiarism pandemic: Inspection of academic dishonesty during the COVID-19 outbreak using originality software. Education and Information Technologies, 29(3), 3279–3299. https://doi.org/10.1007/s10639-023-11967-3
Howard, R. M. (1999). Standing in the shadow of giants: Plagiarists, authors, collaborators. Ablex.
Intellectual Property Code of the Philippines, R.A. 8293 (1998).
Macatangay, J. (2015). Understanding, perception and prevalence of plagiarism among college freshman students of De La Salle Lipa, Philippines. International Journal of Social Science and Humanity, 5(8), 672.
Meyer zu Eissen, S., & Stein, B. (2006). Intrinsic plagiarism detection. In M. Lalmas et al. (Eds.), European Conference on Information Retrieval (ECIR) (pp. 565–569). Springer. https://downloads.webis.de/publications/papers/meyerzueissen_2006b.pdf
Mulla, S. (2014). A study of “idea” plagiarism in two undergraduate students’ emails using forensic authorship analysis and plagiarism detection methods. Diffusion- The UCLan Journal of Undergraduate Research, 7(2). https://scispace.com/papers/a-study-of-idea-plagiarism-in-two-undergraduate-students-47dpy4sc33
Naseem, R., & Kurian, S. (2013). Extrinsic Plagiarism Detection in Text Combining Vector Space Model and Fuzzy Semantic Similarity Scheme. https://www.semanticscholar.org/paper/Extrinsic-Plagiarism-Detection-in-Text-Combining-Naseem-Kurian/5ffc831b509e4d4b20c165be7facce98b3e25b19
Pecorari, D. (2010). Academic Writing and Plagiarism : A Linguistic Analysis. In Bloomsbury Academic eBooks. London : Continuum, c2008. https://doi.org/10.5040/9781474211727
Pennycook, A. (1996). Borrowing Others’ Words: Text, Ownership, Memory, and Plagiarism. TESOL Quarterly, 30(2), 201–230. https://doi.org/10.2307/3588141
Perkins, M., Gezgin, U. B., & Roe, J. (2020). Reducing plagiarism through academic misconduct education. International Journal for Educational Integrity, 16(1), 1–11. https://doi.org/10.1007/s40979-020-00052-8
Ratna, A., Purnamasari, P., Adhi, B., Ekadiyanto, F., Salman, M., Mardiyah, M., & Winata, D. (2017). Cross-language plagiarism detection system using latent semantic analysis and learning vector quantization. Algorithms, 10(2), 69. https://doi.org/10.3390/a10020069
Resurreccion, P. F. (2012). The Impact of Faculty, Peers and Integrity Culture in the Academe on Academic Misconduct among Filipino Students: An Empirical Study Based on Social Cognitive Theory. https://scholar.google.com/citations?view_op=view_citation&hl=en&user=fBn4makAAAAJ&citation_for_view=fBn4makAAAAJ:u-x6o8ySG0sC
Robles, V., Rivas, M., & Campos, J. (2020). Study of the reasons for and measures to avoid plagiarism in young students of education. Profesorado, Revista de Currículum y Formación del Profesorado, 24(1), 50–74. https://doi.org/10.30827/profesorado.v24i1.8572
Roman, A. G. (2018). Minimizing plagiarism incidence in research writing in one state university in the Philippines. Asian Journal of Multidisciplinary Studies, 1(1), 27–33. https://asianjournal.org/index.php/ajms/article/download/12/7
Sousa-Silva, R. (2014a). Detecting translingual plagiarism and the backlash against translation plagiarists. Zenodo (CERN European Organization for Nuclear Research). https://doi.org/10.5281/zenodo.47258
Sousa-Silva, R. (2014b). Investigating academic plagiarism: A forensic linguistics approach to plagiarism detection. International Journal for Educational Integrity, 10(1), 31–41. https://doi.org/10.21913/IJEI.v10i1.932
Sousa-Silva, R. (2015). “Reporter fired for plagiarism”: A forensic linguistic analysis of news plagiarisam. Oslo Studies in Language, 7(1). https://doi.org/10.5617/osla.1450
Sozon, M., Pok, W. F., Sia, B. C., & Alkharabsheh, O. H. M. (2024). Cheating and plagiarism in higher education: A systematic literature review from a global perspective, 2016–2024. Journal of Applied Research in Higher Education, 17(5), 1728–1742. https://doi.org/10.1108/jarhe-12-2023-0558
Swales, J., & Feak, C. (2012). Academic writing for graduate students (3rd ed.). University of Michigan Press.
Zimba, O., & Gasparyan, A. (2021). Plagiarism detection and prevention: A primer for researchers. Reumatologia, 59(3), 132–137. https://doi.org/10.5114/reum.2021.105974
| Copyright of articles rests with the authors. Please cite TESL-EJ appropriately. Editor’s Note: The HTML version contains no page numbers. Please use the PDF version of this article for citations. |

