Why does Scrutinizer say "duplicate code" when these two methods are totally different? Is this a false-positive or does Scrutinizer indeed want to see this in a more abstract kind of way?
-
2It's a false positive. I highly doubt scrutinizer looks at the *semantics* when performing this heuristic check. – user2864740 Jan 01 '15 at 20:47
-
1@Sliq please add a description to the scrutinizer tag – Jasen Jan 01 '15 at 20:48
-
the conly common thing I see is the indent depth, and first token on each line. – Jasen Jan 01 '15 at 20:48
-
@Jasen The lines all start the same and have a relatively low (per line) difference delta. – user2864740 Jan 01 '15 at 20:49
-
https://scrutinizer-ci.com/blog/introducing-new-duplicated-code-detection-for-php - "it is *robust against code modification* and also finds smaller code fragments which make very good targets for refactoring", although it doesn't provide an algorithm review. However, if the algorithm replaced constants with the same stub value then the two pieces of code would be mostly equivalent in what remained. Understanding "mass" might be useful.. – user2864740 Jan 01 '15 at 20:53
-
1I've had this false positive behaviour in my own projects. That said, you can save three lines per method by having one return line, and dropping the `if` block, thus for the first method: `return $query->getRowCount() > 0` (and `return $query->getRowCount() == 1` for the second method). – halfer Jan 01 '15 at 20:56
2 Answers
My guess is that they do what is called "normalization", i.e. the text is split into smaller parts (called tokens) and then some of these tokens are replaced with different text to make them all the same. For example, all numbers and strings are normalized to be the same number/string.
This makes sure that you can find clones that only differ in literals, which is helpful, because it usually means that you can extract a utility method that takes these differing literals as a parameter and thus reduce the redundancy in your code.
So to the clone detector, your code would then look something like this (all uppercase text is normalized):
public function IDENTIFIER($VARIABLE1) {
$VARIABLE2 = $this->database->prepare(STRING);
$VARIABLE2->execute(ARRAY_EXPRESSION);
if ($VARIABLE2->rowCount() == INTEGER) {
return BOOLEAN;
}
return BOOLEAN;
}
Both functions would be normalized to this exact same representation and the clone detector will then pick this up as duplicated code.
The only sensible refactoring I can see for your code would be to extract a helper function that handles preparing and executing the query and returning the row count:
public function executeRowCountQuery($query_string, $query_variables) {
$query = this->database->prepare($query_string);
$query->execute($query_variables);
return $query->rowCount();
}
This might make sense if you have many different queries that are only interested in the row count.

- 2,702
- 26
- 36
It's better to disable the validation at all in .scrutinizer.yml
checks:
php:
duplication: false

- 3,415
- 1
- 34
- 43