49

I am in a project where previous programmers have been copy-pasting codes all over the place. These codes are actually identical (or very similar) and they could have been refactored into one.

I have spent countless hours refactoring these codes manually but I think there must be a better way. Some are very trivial static methods that could have been moved into an ancestor class (but instead was copy pasted all over by previous junior programmers).

Is there a code analysis tool that can detect this and provide reports/recommendations? I prefer free/open source tool if possible.

Rosdi Kasim
  • 24,267
  • 23
  • 130
  • 154
  • 12
    Quite unfortunate that some of the most useful discussions are closed as "off-topic". Did discussion below contain "opinionated answers and spam"? Why as soon as people get a bit of power they experience this constant urge to police something that doesn't require any policing? – user1433852 Jul 24 '15 at 21:50

6 Answers6

26

I use the following tools:

Both tools have code duplication detection support. But both of them lack the ability to advise you how to refactor your code.

JetBrains IntelliJ IDEA Ultimate has good static code analysis with code duplication support, but it is not free.

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
uthark
  • 5,333
  • 2
  • 43
  • 59
5

SonarQube can detect duplicated codes but does not give recommendation on eliminating them. It is free and - although with the default setup it can only detect lexically identical clones

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
pepe
  • 61
  • 1
  • 2
5

Most of the tools listed on the Wikipedia article on Duplicate Code Tools will detect duplicates in many different languages, including Java.

Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
wsanville
  • 37,158
  • 8
  • 76
  • 101
  • 3
    Since someone removed the unencyclopedic links from wikipedia, here's the link to the old version of the page: http://en.wikipedia.org/w/index.php?title=Duplicate_code&oldid=522795578 – Nickolay Dec 04 '12 at 22:53
2

Either Simian or PMD's CPD. The former supports a wider set of languages but is non free for commercial projects.

Pascal Thivent
  • 562,542
  • 136
  • 1,062
  • 1,124
  • 1
    One feature of simian that's quite good is it's ability to find code that was not copied, but developed independently. So it may do the same thing, but have completely different variable names and even sub types. In simainls setup you can specify to ignore variable names and regard sub types as the same parent type etc. – drekka Jun 22 '10 at 05:04
  • It is extremely rare for clone detectors to find code that "was not copied but developed independently" unless the code fragments are microscopic (a*b is a clone of x*y and is developed independently but nobody cares). Having built a strong clone detector, my experience is what they find is code that has been cloned; better ones can find cloned code with changed variable names and different constants. Simian is one of these. Strong ones (mine is one of these) can detect when arbitrary subexpressions and statements have been replaced. – Ira Baxter Jun 22 '10 at 14:25
  • Simian doesn't seem to be around any more in its original form. In any case, the link is dead. Here is a link to a Simian tool but it is not clear to me if it is the same product: http://www.harukizaemon.com/simian/ – pjv Jan 06 '13 at 13:32
0

http://checkstyle.sourceforge.net/ has support for finding duplicates

Nikolaus Gradwohl
  • 19,708
  • 3
  • 45
  • 61
0

See our SD Java CloneDR, a tool for detecting exact and near-miss duplicate code in large Java systems.

The CloneDR will find code clones in spite of whitespace changes, line breaks, comment insertions deletions, modification of constants or identifiers, and in a number of cases, even replacement of one statement by another or a block of statements.

It shows where each set of clones is found, each individual clone, an abstraction of the clones having their shared commonality and parameterization of the abstraction to show how each clone instance can be derived from the abstraction.

It finds 10-20% clones in most Java systems.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341