6

I have a couple of simple C++ homeworks and I know the students shared code. These are smart students and they know how to cheat moss. I'm looking for a tool that can rename variables based on their types (first variable of type int will be int1, first int array will be intptr1...), or does something similar that I cannot think of now. Do you know a quick way to do this?

edit: I'm required to use moss and report 90% match

Thanks

skaffman
  • 398,947
  • 96
  • 818
  • 769
perreal
  • 94,503
  • 21
  • 155
  • 181
  • I'm not sure I understand how changing variable names will detect plagiarism, perhaps I'm slow. – GManNickG May 03 '11 at 22:47
  • @GMan: I guess the premise is that two students will have identical code, except with different variable names. If you rename all the variables to defaults, they will become identical. – Oliver Charlesworth May 03 '11 at 22:49
  • Because they copy the code and change the variable names. Moss does not perform well when the change is significant. – perreal May 03 '11 at 22:49
  • 2
    Seriously? Moss doesn't detect changing variable names? I don't know about kids today, but that was the first thing they tried when I was in school. – Robᵩ May 03 '11 at 22:50
  • @Oli, not identical but it looks undeniable once variable names are identical – perreal May 03 '11 at 22:50
  • I have to hit 90% to claim cheating – perreal May 03 '11 at 22:53
  • @Oli: Oh, right. Seems like you'd need to also chop out whitespace as well, then. – GManNickG May 03 '11 at 22:54
  • 4
    Moss does its comparisons on the IR, so variable names don't matter. If Moss doesn't catch them then renaming won't either. Heck, your method is fooled by just changing the order of declarations. – Adam May 03 '11 at 22:58
  • @Adam yes, reordering functions based on types and renaming them as well? I guess I need to write a parser :D – perreal May 03 '11 at 23:01
  • 4
    Wait, so you're not looking for an alternative to Moss, but instead you want to change the submitted code to increase the similarity scores returned by Moss? I'd say that's very shaky ethical grounds. Your students have a good defense in the fact that YOU made their code more similar. – Adam May 03 '11 at 23:26
  • Using a public tool of course assures that all submissions are prescreened. What else should we expect? – Bo Persson May 04 '11 at 16:06

4 Answers4

4

Yep, the tool you're looking for is called a compiler. :)

Seriously, if the programs submitted are exactly the same except for the identifier names, compiling then (without debugging info) should result in exactly the same output.

If you do this with debugging turned on, the compiler may leave meta-data in the executable that is different for each executable, hence the comment about ensuring it is off. This is also why this wont work for Java programs - that kind of info is present whether in debug mode or not (for the purposes of dynamic introspection).

EDIT: I see from the comments added to the question that you're observing some submissions that are different in more than just identifier names. If the programs are still structurally equivalent, this should still work.

EDIT: Given that the use of moss is a requirement, this probably isn't the way to go. I does seem though that moss has some support for comparing assembly - perhaps compiling to assembler and submitting that to moss is an option (depending on what compiler you're using).

Mac
  • 14,615
  • 9
  • 62
  • 80
3

You can download and try our C CloneDR duplicate code detector. It finds duplicated code even when the variable names have been changed. Multiple changes in the same chunk are treated as just one; if they rename the varaibles consistenly everywhere, you'll get back a report of "one clone" with the precise variable subsitution.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
3

You can try Copy Paste Detector with ignoreIdentifiers turned on. You can at least use it for a first pass before going to the effort of normalizing names for moss. Or, since the source is available, maybe you can get it to spit out its internal normalization of the code.

xan
  • 7,511
  • 2
  • 32
  • 45
2

Another way of doing this would be to compile the applications and compare their binaries, so your examination is not limited to variable/function name changing.

An HEX editor can help you with that. I just tried ExamDiff (not free $) and I was happy with the result.

karlphillip
  • 92,053
  • 36
  • 243
  • 426
  • Just remember to use the same compiler. **;D** – karlphillip May 03 '11 at 23:07
  • That might not work. The compiler might time stamp them. It might allocate memory different on different runs, and produce code based on the address of some internal structure. – Ira Baxter May 03 '11 at 23:08
  • I would compile to asm to get more readable `diff`s and reduce noise; the `-S` option of `gcc` could be helpful. – Matteo Italia May 03 '11 at 23:10
  • There are other tools that can show you visually what are the differences in the files. This allows you to see how much the binaries are equal. – karlphillip May 03 '11 at 23:14