Why Most Assignments Are Too Easy to Copy
A well‑structured C++ project in an intermediate programming class at a large state university had 140 students. Three weeks after the due date, the instructor ran a similarity scan and found that 38 submissions shared nearly identical logic, variable names, and comments. Every student had copied from the same GitHub repository. The instructor had no automated detection in place and only caught the pattern because one student accidentally submitted an unchanged version of the online source.
This is not an outlier. When assignments are designed without plagiarism resistance, a significant fraction of students will copy—not because they lack integrity, but because it’s easy and undetected. A 2022 survey across 15 U.S. computer science departments found that 41% of students admitted to copying code from online sources at least once in the previous semester. The problem grows with class size and with the availability of complete solutions on GitHub, Stack Overflow, and Chegg.
The fix does not require banning internet access or becoming a code forensics expert. With careful assignment design and a lightweight checking pipeline, you can raise the cost of plagiarism high enough that most students choose the honest path.
Six Tactics That Slash Copied Submissions
Each of the following tactics is a concrete, implementable change. Use them individually or combine them for maximum protection. I include code examples in Python, Java, and C++ to show language‑agnostic patterns.
1. Force Unique Structural Decisions
The most common cheating vector is a single canonical solution that students find online. If your specification is “write a function that sums an array,” there’s one correct implementation—and it’s on GitHub a hundred times. Instead, require students to make arbitrary but graded structural choices that differ across submissions.
Example: In a Java assignment on matrix multiplication, require each student to include their student‑ID‑derived “shuffle count” that reorders loops in a specific way. The shuffle count does not affect correctness but changes the AST (abstract syntax tree) shape enough that copy‑pasted code becomes obvious.
// Starter code provided to each student with a unique integer 'k'
public class MatrixMultiplier {
private final int shuffle;
public MatrixMultiplier(int shuffle) { this.shuffle = shuffle; }
public int[][] multiply(int[][] a, int[][] b) {
int n = a.length;
int[][] result = new int[n][n];
// Student must order loops using shuffle % 3:
// 0: i-k-j, 1: i-j-k, 2: k-i-j
int orders = shuffle % 3;
if (orders == 0) { /* i-k-j order */ }
else if... // unique per student
return result;
}
}
You can automate the generation of starter files with a simple script that reads student IDs from the course roster and creates unique shuffle values. A tool like Codequiry can then flag submissions where two students happen to share the same loop order and similar logic—a near‑zero probability by chance.
2. Require Descriptive Variable Names from a Proprietary Style Guide
Many professors frown on enforcing variable naming for originality. But naming conventions are a cheap, powerful signal. When students copy from Stack Overflow, they usually keep short names like x, temp, result. A policy that requires a precise, unusual naming pattern—e.g., all variables must be at least three characters, use camelCase, and include a short abbreviation of their purpose—makes copied code easy to spot.
I require students to append a type prefix to every variable name: iIdx for integer index, strName for string, lstItems for list. It sounds pedantic, but it drastically reduces the pool of exact‑match submissions. For the same assignment, I have seen 20% of submissions become uniquely identifiable just by naming. Combined with a style checker like pylint or checkstyle that rejects code without the required prefixes, you enforce this automatically.
# Allowed: iCount, strInput, lstResults
# Rejected: c, inp, res
“In a 300‑student Python course, mandatory naming prefixes cut the number of near‑identical submissions from 57 in semester A to 8 in semester B—without changing any other detection method.” — Report from a mid‑sized university’s academic integrity committee.
3. Build a Randomized Test Harness Into the Starter Code
Students often copy code without understanding how the grader tests it. If you embed a unique seed inside each student’s starter file and use that seed to generate test data, then copy‑pasted code will fail unless the student adapts the harness. That adaptation is exactly the work most copiers skip.
Here’s a concrete Python example for an assignment that implements a sorting algorithm:
# starter.py — each student gets a unique 'seed' value
import random
SEED = 123456 # replace with student‑specific value
def generate_test_data(seed):
random.seed(seed)
return [random.randint(0, 1000) for _ in range(20)]
def grade():
data = generate_test_data(SEED)
sorted_data = your_sort(data) # student implements
# check sorted_data is correct
...
if __name__ == '__main__':
grade()
When a student copies a solution from a classmate or the web, the your_sort function will receive a different set of numbers (because SEED differs) and may still work—most sorting algorithms are data‑agnostic. But the real protection comes from also embedding a “signature” variable that the solution must print at the end, something like print("ANSWER: " + str(SEED % 7)). That signature ends up in every copied output and ties the submission to a specific seed, making cross‑student matching trivial.
4. Require a Unique Comment Header Describing Design Tradeoffs
A surprising number of students copy the entire file, including comments. Asking for a mandatory header that explains the student’s personal rationale for key design decisions forces them to produce original prose. Even if the code is partially copied, a unique header can serve as a fingerprint when you run a text‑similarity check.
Specify a format like:
/*
* Student: [name]
* Approach: I chose a recursive implementation for the Fibonacci function
* because the base case is simpler in this problem. My largest test case
* was n=35, and the recursive depth stayed under 10.
* Tradeoff: Iteration would be faster for n > 40, but readability suffers.
*/
You can batch‑extract these headers using a simple grep or a script, then run a quick pairwise comparison (e.g., with diff or a dedicated text similarity tool). Identical prose is a red flag that doesn’t require a code comparison engine.
5. Run Automated Similarity Detection on Every Submission Batch
No assignment design can eliminate all cheating. The final layer is systematic detection using well‑tuned similarity tools. This is where you move beyond manual inspection and integrate a pipeline that processes every submission automatically.
The workflow:
- Collect submissions — from Canvas, Gradescope, or a local directory.
- Strip comments and whitespace — many copiers change whitespace to avoid detection. Use a script to normalize all files.
- Run token‑based and AST‑based comparison — tools like MOSS (token‑based), JPlag (string token), or Codequiry (multi‑level similarity including AST) give you both a numeric similarity score and highlighted matches.
- Cross‑reference with web sources — the best tools also search GitHub and public repositories for matching code blocks.
- Review top similarity pairs — manually inspect 5%–10% of flagged submissions, focusing on pairs above a threshold (e.g., >85% similarity after normalization).
Here’s an example one‑liner using MOSS’s command‑line client (after setup):
moss.pl -l python -c "python3" -d submissions/*.py
For a more automated approach, wrap this in a cron job or GitHub Action that runs every night before the grading deadline. Tools like Codequiry provide an API that lets you push submissions and receive results directly in your LMS—saving the manual script step.
6. Pair Design Choices With a Scoring Penalty for High Similarity
A policy that carries teeth: any pair of submissions with similarity above X% automatically loses Y points, unless students submit a retroactive collaboration request. Publish the threshold in the syllabus. In my courses, similarity above 80% after normalization loses 30% of the assignment grade. Above 90% triggers an academic integrity referral.
This does not replace an honor code—it reinforces it. Students know they cannot “gamble” on not being caught because the tool runs every cycle. Over three semesters of this policy, submission pairings above 70% dropped from an average of 12 per assignment to fewer than 2.
Putting It All Together: A Grading Pipeline
For a typical 10‑assignment course with 200 students, here’s the pipeline I use each week:
- Friday 11:59 PM — Submissions close in Canvas.
- Saturday 8:00 AM — A cron job clones all submissions from Canvas API into a flat directory.
- 8:15 AM — A Python script normalizes each file (removes whitespace, normalizes naming to placeholder tokens).
- 8:30 AM — Script calls Codequiry API with the folder path; returns a JSON report of pairwise similarities and web‑source matches.
- 9:00 AM — Script generates a summary CSV: student pairs, similarity score, suspicious web matches.
- Monday 10:00 AM — I review the top 15 pairs and make decisions.
- Monday 3:00 PM — Grades released; flagged students receive an email with the similarity evidence and a link to the course honor code.
The entire pipeline runs in under an hour with no manual intervention except the review step. This is easily automatable with GitHub Actions and a Canvas‑to‑S3 upload trigger.
Frequently Asked Questions
Doesn’t designing unique assignments increase workload?
Not significantly. The per‑assignment overhead is about 30 minutes to generate student‑specific starter files and update the test harness. That time is recovered by reducing manual plagiarism investigation by hours per week.
Will these tactics affect student learning or creativity?
Properly designed, they do not. The structural variety (loop orders, naming) is trivial for students but creates significant differentiation for detection. The design rationale header actually forces deeper thinking.
What if students share their unique starter files?
That’s a red flag in itself. When two students submit the same shuffle value and similar code, the proof is immediate. Your policy should forbid sharing starter files.
Do I need a paid tool to make this work?
No. MOSS and JPlag are free for academic use. Codequiry’s free tier handles up to 100 submissions per month. The pipeline works with any of them—the key is consistency and a clear policy.