20

I'm currently reviewing a very old C++ project and see lots of code duplication there.

For example, there is a class with 5 MFC message handlers each holding 10 identical lines of code. Or there is a 5-line snippet for a very specific string transformation every here and there. Reducing code duplication is not a problem in these cases at all.

But I have a strange feeling that I might be misunderstanding something and that there was originally a reason for this duplication.

What could be a valid reason for duplicating code?

sharptooth
  • 167,383
  • 100
  • 513
  • 979

20 Answers20

26

A good read about this is large scale c++ software design by John Lakos.

He has many good points about code duplication, where it might help or hinder a project.

The most important point is asking when deciding to remove duplication or duplicate code:

If this method changes in the future, do I want to change the behaviour in the duplicated method, or needs it to stay the way it is?

After all, methods contain (business) logic, and sometimes you'll want to change the logic for every caller, sometimes not. Depends on the circumstances.

In the end, it's all about maintenance, not about pretty source.

Sam
  • 28,421
  • 49
  • 167
  • 247
  • Agreed. People get paranoid about structural changes, especially if it already works and releases are looming. – Mike Lewis Sep 01 '09 at 13:30
  • 7
    "In the end, it's all about maintenance, not about pretty source." <-- Mucho importante!! :) – cwap Sep 01 '09 at 13:31
  • The right answer is to put the common logic into a separate function, `common()`, and then have `businesslogic1()` and `businesslogic2()` both call that. If (just) `businesslogic1()` needs to change in future, *then* copy & paste from `common()` into it and make changes. (But only if you can't easily parameterise `common()` to handle both cases.) – j_random_hacker Jan 27 '11 at 12:35
16

Laziness, that's the only reason I can think of.

On a more serious note. The only valid reason I can think of is changes at the very end of the product cycle. These tend to undergo a lot more scrutiny and the smallest change tends to have the highest rate of success. In that limited circumstance it is easier to get a code duplication change through as opposed to refactoring out a smaller change.

Still leaves a bad taste in my mouth.

JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
  • Its kind of reverse laziness though isn't it? I mean, it would be far lazier to make a function and call it everywhere... – Justicle Sep 01 '09 at 05:34
  • 4
    @Justicle True, but when you just want to finish that one function and try your code, its easier to add the 5 lines of code there, rather than think about passing parameters/return types and all the other stuff that comes with a function. – DeusAduro Sep 01 '09 at 06:14
  • 3
    Also, the copy&paste method means you don't have to modify the original code at all, which means there's less chance of breaking something -- an important concern if you're in the period immediately before a release. – Jeremy Friesner Sep 01 '09 at 06:29
14

Besides being inexperienced, there is why duplicated code occurrences might show up:

No time to properly refactor

Most of us are working in a real world where real constraints force us to move quickly to real problems instead of thinking about niceness of the code. So we copy&paste and move on. With me, if I later see that code is duplicated several more times, it is the sign that I have to spend some more time on it and converge all instances to one.

Generalization of the code not possible/not 'pretty' due to language constraints

Lets say that deep inside a function you have several statements that greatly differ from instance to instance of same duplicated code. For example: I have a function that draws 2d array of thumbnails for the video, and it's embedded with calculation of each thumbnail position. In order to calculate hit-test (calculate thumbnail index from click position) I am using same code but without painting.

You are not sure that there will be generalization at all

Duplicate code at first, and later observe how it will evolve. Since we are writing software, we can allow 'as late as possible' modifications to the software, since everything is 'soft' and changeable.

I'll add more if I remember something else.


Added later...

Loop unrolling

In time before compilers were smart as Einstein and Hawking combined, you had to unroll the loops or inline code to be faster. Loop unrolling will make your code to be duplicated, and probably faster by few percents, it compiler didn't do it for you anyway.

Daniel Mošmondor
  • 19,718
  • 12
  • 58
  • 99
14

When I first started programming, I wrote an app where I had a bunch of similar functionality which I wrapped up in a neat little 20-30 line function ... I was very proud of myself for writing such an elegant piece of code.

Shortly after, the client changed the process in very specific cases, then again, then again, then again , and again, and again .... (many many more times) My elegant code turned into a very difficult, hackish, buggy, & high maintenance mess.

A year later, when I was asked to do something very similar, I deliberately decided to ignore DRY. I put together the basic process, and generated all duplicate code. The duplicate code was documented and I saved the template used to generate the code. When the client asked for specific conditional change (like, if x == y^z + b then 1+2 == 3.42) it was a piece of cake. It was unbelievably easy to maintain & change.

In retrospect, I probably could have solved many of these problems with function pointers and predicates, but using the knowledge I had at the time, I still believe in this specific case, this was the best decision.

John MacIntyre
  • 12,910
  • 13
  • 67
  • 106
12

You might want to do so to make sure that future changes in one part will not unintentionally change the other part. for example consider

Do_A_Policy()
{
  printf("%d",1);
  printf("%d",2);
}

Do_B_Policy()
{
  printf("%d",1);
  printf("%d",2);
}

Now you can prevent "code duplication" with function like this:

first_policy()
{
printf("%d",1);
printf("%d",2);
}

Do_A_Policy()
{
first_policy()
}

Do_B_Policy()
{
first_policy()
}

However there is a risk that some other programmer will want to change Do_A_Policy() and will do so by changing first_policy() and will cause the side effect of changing Do_B_Policy(), a side effect which the programmer may not be aware of. so this kind of "code duplication" can serve as a safety mechanism against this kind of future changes in the program.

Liran Orevi
  • 4,755
  • 7
  • 47
  • 64
  • 1
    Well, sounds to me like `first_policy` would need to take a parameter of sorts. – GManNickG Sep 01 '09 at 06:34
  • 2
    This example screams for a unit test. – John MacIntyre Sep 01 '09 at 12:29
  • 1
    I see where you're going, but I think it's much more maintainable to factor the logic into `first_policy()` as you've done. If you need to find all uses of this logic later, it's much easier to find all calls to `first_policy()` than it is to find "all pairs of `printf()` statements that look like this". A coder who changes the semantics of a function without checking all call sites needs to be... *persuaded* not to do that. :) – j_random_hacker Jan 27 '11 at 12:41
6

Sometimes methods and classes which domain-wise have nothing in common, but implementation-wise looks a lot alike. In these cases it's often better to do code duplication as future changes more often that not will branch these implementations into something that aren't the same.

cwap
  • 11,087
  • 8
  • 47
  • 61
  • 1
    can you give a real life example for such a situation? – flybywire Sep 01 '09 at 08:51
  • @flybywire those are plenty and not doing as cwap suggest is a rather common (and often ovrlooked) design error. It often leeds to a lot of branching/switching on state when ever you need to execute logic and quite because refactoring is usually done locally and not accross baoundaries – Rune FS Apr 10 '12 at 08:04
4

The valid reason I can think of: If the code gets alot more complex to avoid the duplication. Basically that's the place when you do almost the same in several methods - but just not quite the same. Of course - you can then refactor and add special parameters including pointers to different members that have to be modified. But the new, refactored method may get too complicated.

Example (pseudocode):

procedure setPropertyStart(adress, mode, value)
begin
  d:=getObject(adress)
  case mode do
  begin
    single: 
       d.setStart(SingleMode, value);
    delta:
       //do some calculations
       d.setStart(DeltaSingle, calculatedValue);
   ...
end;

procedure setPropertyStop(adress, mode, value)
begin
  d:=getObject(adress)
  case mode do
  begin
    single: 
       d.setStop(SingleMode, value);
    delta:
       //do some calculations
       d.setStop(DeltaSingle, calculatedValue);
   ...
end;

You could refactor out the method call (setXXX) somehow - but depending on the language it could be difficult (especially with inheritance). It is code duplication since most of the body is the same for each property, but it can be hard to refactor out the common parts.

In short - if the refactored method is factors more complicated, I'd go with code duplication although it is "evil" (and will stay evil).

Tobias Langner
  • 10,634
  • 6
  • 46
  • 76
  • +1 - the important part here is "depending on the language", but I agree that it can happen even in simple cases like this. – orip Sep 01 '09 at 16:03
3

The only "valid" thing I can see this arising from is when those lines of code were different, then converged to the same thing through subsequent edits. I've had this happen to me before, but none too frequently.

This is, of course, when it's time to factor out this common segment of code into new functionality.

That said, I can't think of any reasonable way to justify duplicate code. Look at why it's bad.

It's bad because a change in one place requires a change in multiple places. This is increased time, with a chance of bugs. By factoring it out, you maintain the code in a single, working location. After all, when you write a program you don't write it twice, why would a function be any different?

GManNickG
  • 494,350
  • 52
  • 494
  • 543
3

For that kind of code duplication (lots of lines duplicated lots of times), I'd say :

  • either laziness (you just paste some code here and there, without having to worry about any impact it could have on other parts of the application -- while writing a new function and using it in two places could, I suppose, have some impact)
  • or not knowing any good practice (re-using code, separating different tasks in different functions/methods)

Probably the first solution, though, from what I've generally seen :-(

Best solution I've seen against that : have your developpers start by maintaining some old application, when they are hired -- that'll teach them that this kind of thing is not good... And they will understand why, which is the most important part.

Splitting code into several functions, re-using code the right way, and all that often come with experience -- or you have not hired the right people ;-)

Pascal MARTIN
  • 395,085
  • 80
  • 655
  • 663
  • In my experience, junior developers who started maintaining ugly code (and didn't get to see any good code) have only acquired a "patch it 'till it works" mentality. – Wim Coenen Sep 01 '09 at 11:50
  • @wcoenen Sounds like you need to recruit better junior developers. A small touch of OCD can be a good thing. – Mike Lewis Sep 01 '09 at 13:29
2

A long time ago when I used to do graphics programming you would, in some special cases, use duplicate code this way to avoid the low level JMP statements generated in the code (it would improve performance by avoiding the jump to the label/function). It was a way to optimize and do a pseudo "inlining".

However, in this case, I don't think that's why they were doing it, heh.

jmq
  • 10,110
  • 16
  • 58
  • 71
2

If different tasks are similar by accident, repeating the same actions in two places is not necessarily duplication. If the actions in one place change, is it probable they should change in other places as well? Then this is duplication you should avoid or refactor away.

Also, sometimes - even when logic is duplicated - the cost of reducing duplication is too high. This can happen especially when it's not just code duplication: for example, if you have a record of data with certain fields that repeats itself in different places (DB table definition, C++ class, text-based input), the usual way to reduce this duplication is with code generation. This adds complexity to your solution. Almost always, this complexity pays off, but sometimes it doesn't - it's your tradeoff to make.

orip
  • 73,323
  • 21
  • 116
  • 148
2

I don't know of many good reasons for code duplication, but rather than jumping in feet first to refactoring, it's probably better to only refactor those bits of the code that you actually change, rather than altering a large codebase that you don't yet fully understand.

Paddy
  • 33,309
  • 15
  • 79
  • 114
1

Sounds like the original author either was inexperienced and/or was hard pressed on time. Most experienced programmers bunch together things that are reused because later there will be less maintenance - a form of laziness.

The only thing you should check is if there are any side effects, if the copied code accesses some global data a bit refactoring may be needed.

edit: back in the day when compilers were crappy and optimizers even crappier it could happen that due to some bug in the compiler one may had to do such a trick in order to get around a bug. Maybe its something like that? How old is old?

AndersK
  • 35,813
  • 6
  • 60
  • 86
1

On large projects ( those with a code-base as large as a GB ) it's quite possible to lose existing API. This is typically due to insufficient documentation, or an inability of the programmer to locate the original code; hence duplicate code.

Boils down to laziness, or poor review practice.

EDIT:

One additional possibility is that there may have been additional code in those methods which was removed along the way.

Have you looked at the revision history on the file?

Everyone
  • 2,366
  • 2
  • 26
  • 39
1

All the answers looks right, but I think there is another possibility. Maybe there are performance considerations as the things you say reminds me "inlining code". It's always faster to inline functions that to call them. Maybe the code you look at has been preprocessed first?

Ignacio Soler Garcia
  • 21,122
  • 31
  • 128
  • 207
  • Modern langauges let you write functions, and then hint loudly they should be inlined. This gives you the best of both worlds: inlining and avoidance of redundant code. – Ira Baxter Sep 03 '09 at 04:30
1

I have no problems with duplicated code when it is produced by a source code generator.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
1

Something that we found that forced us to duplicate code was our pixel manipulation code. We work with VERY large images and the function call overhead was eating up on the order of 30% of our per-pixel time.

Duplicating the pixel manipulation code gave us 20% faster image traversal at the cost of code complexity.

This is obviously a very rare case, and in the end it bloated our source significantly (a 300 line function is now 1200 lines).

Ron Warholic
  • 9,994
  • 31
  • 47
0

There is no good reason for code duplication.

See the Refactor Mercilessly design pattern.

The original programmer was either in a hurry to meet a deadline or lazy. Feel free to refactor and improve the code.

Asaph
  • 159,146
  • 25
  • 197
  • 199
  • 1
    -1, While this asker's case is not a good reason, and almost every other case of duplication isn't either, of course there are valid reasons (see the responses). – orip Sep 01 '09 at 06:18
0

in my humble opinion there's no place for code duplication. have a look, for example, at this wikipedia article

or, let's refer to Larry Wall's citation:

"We will encourage you to develop the three great virtues of a programmer: laziness, impatience, and hubris."

it is pretty clear that code duplication has nothing to do with "laziness". haha;)

varnie
  • 2,523
  • 3
  • 35
  • 42
  • Really? Due to the magic of copy/paste, spamming "HAHAHAHAHAHAHAHAHAHAHAHAHAHAHA" in the comment box is much easier than writing something thoughtful. Duplicate code is easy to write. It is often the lazy solution. – jalf Dec 08 '09 at 11:02
  • 1
    The lazy solution would be to go to get a coffee and think about how to DRY instead of c&p (which would be stupid way). Theres a difference between smart people being lazy and people being lazy. Programmers (said to be smart) reduce work while other people sit on the couch and get fat. – atamanroman Jul 19 '10 at 06:52
0

Since there is the "Strategy Pattern", there is no valid reason for duplicate code. Not a single line of code must be duplicated, everything else is epic fail.

Turing Complete
  • 929
  • 2
  • 12
  • 19