simplifying code via refactoring

Question

Is there a refactoring tool, either for C or Java that can simplify this type of redundant code. I believe this is called data propagation.

This is essentially what an optimizing compiler would do.

public int foo() {
    int a = 3;
    int b = 4;
    int c = a + b;
    int d = c;
    System.out.println(c);
    return c;
}

into

public int foo() {
    int c = 7;
    System.out.println(c);
    return c;
}

Why would you want to do that if the compiler does it anyway? — undur_gongor, Aug 26 '11 at 20:49
There are two transformations being done here: "constant propagation" and "constant folding". But it isn't "redundant" code you are getting rid of; it is unnecessary code. Some one argue that the "unnecessary" code was put there to separate out the issues. (Some of it might be just stupid, granted). — Ira Baxter, Aug 26 '11 at 20:52
@undur_gongor: The reason would be to have less room for error, generally, the less code you have the fewer errors you have. — Jack, Aug 26 '11 at 21:35
@Jack: Yes, but for the refactoring the code has to be correct already. And I guess in most cases the elaborate code will be better in terms of maintainability/readability (see Alfredo O's answer). — undur_gongor, Aug 26 '11 at 21:43
@Jack -- presumably the "legacy" code has been tested and runs productively so the "room for error" argument is spurious. — James Anderson, Sep 07 '11 at 01:40
I'm really worried about the skills of the developer who wrote that bit of code that you so desperately want to refactor. I think the appropriate tool in this case would be extensive brainwashing. ;-) — Wivani, Sep 07 '11 at 10:53
There a couple of reasons to refactor. One is to change the design/structure to be better suited to some new extension and the other is to improve clarity. I think you might get better answers if your function were not named foo() and your variables had meaningful names because in the abstract, it is impossible to tell why c = 7 is clearer than c = a + b, and it is impossible to tell what foo and the variables in foo are for, so it is impossible for people to understand the motivation of this refactoring. — jhericks, Sep 07 '11 at 18:15

score 20 · Answer 1 · answered Aug 26 '11 at 20:46

20

I think it's not a good idea.

It's for example the following code:

long hours = 5;
long timeInMillis = hours * 60 * 1000;

That's much more cleaner and understandable than just:

long timeInMillis = 300000;

answered Aug 26 '11 at 20:46

Alfredo Osorio

11,297
12
56
84

not really relevant to what the OP was asking, the machine code underneath has 0 to do with the high level language that is being displayed. – Woot4Moo Aug 26 '11 at 20:49
1

@Woot4Moo -- machine code? nobody mentioned machine code he is writig about code clarity and human understanding. He is correctly pointing out that anyone reading the first version can clearly understand that we are talking about a five hour interval and can validate that the value is correct. In the second case even if the variable was called "fivehours" a code reviewer would probably miss a wrong constant value such as 3000000. – James Anderson Sep 07 '11 at 01:44
@James where did he asked anything about readability..I think he might be talking about code optimisation. – Manish Singh Sep 07 '11 at 02:42
2

@Woot4Moo -- any modern compiler will optimize this automatically. So there can be no other motivation for doing this than to clean up the source code. However making the code less readable is not a good idea. – James Anderson Sep 07 '11 at 03:31

score 6 · Answer 2 · edited May 23 '17 at 12:24

I can offer a solution for C. My solution uses the two tools that I described in another answer here (in reverse order).

Here is your program, translated to C:

int foo() {
    int a = 3;
    int b = 4;
    int c = a + b;
    int d = c;
    printf("%d", c);
    return c;
}

Step 1: Constant propagation

$ frama-c -semantic-const-folding t.c -lib-entry -main foo
...
/* Generated by Frama-C */
/*@ behavior generated:
      assigns \at(\result,Post) \from \nothing;  */
extern int ( /* missing proto */  printf)() ;
int foo(void) 
{
  int a ;
  int b ;
  int c ;
  int d ;
  a = 3;
  b = 4;
  c = 7;
  d = 7;
  printf("%d",7);
  return (c);
}

Step 2: Slicing

$ frama-c -slice-calls printf -slice-return foo -slice-print tt.c -lib-entry -main foo
...
/* Generated by Frama-C */
extern int printf() ;
int foo(void) 
{
  int c ;
  c = 7;
  printf("%d",7);
  return (c);
}

score 3 · Answer 3 · answered Sep 01 '11 at 08:28

Yes, the best refactoring tool I've seen people using is thier brain.

The brain seems a remarkably good tool for logically organising code for consumption by other brains. It can also be used to enhance the code with comments, where appropriate, and impart additional meaning with layout and naming.

Compilers are good for optimising the code for consumption by an underlying layer closer to transistors that make up the processor. One of the benefits of a higher generation programming langauge is that it doesen't read like something a machine made.

Apologies if this seems a little glib and unhelpful. I certainly have used variaious tools but I don't recall any tool that handled "data propogation."

score 2 · Answer 4 · answered Sep 05 '11 at 21:21

Eclipse (and I'm sure NetBeans and IntelliJ) has almost all these refactorings available. I'll give the specifics with Eclipse. Start with:

public int foo() {
    int a = 3;
    int b = 4;
    int c = a + b;
    int d = c;
    System.out.println(c);
    return c;
}

First, d will show as a warning that you have an unread local variable. <CTRL>+1 on that line and select "Remove d and all assignments". Then you have:

public int foo() {
    int a = 3;
    int b = 4;
    int c = a + b;
    System.out.println(c);
    return c;
}

Next, highlight the a in int c = a + b; and type <CTRL>+<ALT>+I to inline a. Repeat with b and you will have:

public int foo() {
    int c = 3 + 4;
    System.out.println(c);
    return c;
}

Now you're almost there. I don't know of a refactoring to convert 3+4 into 7. It seems like it would be easy for someone to implement, but is probably not a common use-case as others have pointed out that, depending on the domain, 3+4 can be more expressive than 7. You could go further and inline c, giving you:

public int foo() {
    System.out.println(3 + 4);
    return 3 + 4;
}

But it is impossible to know if this an improvement or a step backwards without knowing the 'real' problem with the original code.

score 1 · Answer 5 · answered Aug 26 '11 at 20:48

1

the semantic information of the code may get lost. possible dependencies might break. In short: only the programmer knows which variables are important or may become important, since only the programmer knows the context of the code. I'm afraid you'll have to do the refactoring yourself

answered Aug 26 '11 at 20:48

Andreas Grapentin

5,499
4
39
57

things are more likely to break if i do this manually instead of using a computer that does not get tired – Saideira Aug 26 '11 at 20:52
oh, it's just a matter of practice, really. Also, keep in mind, that less lines of code does not necessarily mean better code. – Andreas Grapentin Aug 26 '11 at 20:54

score 1 · Answer 6 · answered Aug 26 '11 at 20:54

Yes, IntelliJ offers this functionality inside of their community edition. Now to address a more serious issue, I am pretty sure you are mixing up compilation with refactoring. When you compile something you take a language higher than machine code and convert it into machine code (essentially). What you want is to remove declarations that are redundant inside the high level language that is your program file, .c,.java,etc . It is quite possible that the compiler has already optimized the less than great code into what you propose, there are tools available to see what it is doing. In terms of refactoring less is typically better, but do not sacrifice maintainability for less lines of code.

What is this feature called in INtelliJ? I looked through their entire Refactor menu and did not see anything suitable. — Saideira, Aug 26 '11 at 21:00
It does it automatically with the out of the box configuration (granted I am on the pay for version) — Woot4Moo, Aug 26 '11 at 21:01

Mysticial · Answer 7 · 2011-09-05T18:51:05.433

One possible approach is to put it into a symbolic math program (like Mathematica or Maple) and have it do the simplification for you. It will do it regardless of whether they are constants or not.

The drawback is that you need to convert the code to a different language. (Though it could be mostly copy and paste if the syntax is similar.) Furthermore, it could be dangerous if you expect certain integer types to overflow at a specific size. Symbolic math programs don't care and will optimize it according to the "math". Same thing goes for floating-point round-off errors.

In your example, if you enter this into Mathematica:

a = 3;
b = 4;
c = a + b;
d = c;
c

Will output this in Mathematica:

Of course you can't just copy and paste because it's a different language and different syntax, but it's the best thing I have in mind for your question. I myself use Mathematica to simplify expressions and other math before I throw it into C/C++.

For a more complicated example involving unknowns:

Original C Code:

int a = 3 + x*x;
int b = 4 + y*y;
int c = a + b - 7 + 2*x*y;
int d = c;

Enter this into Mathematica (which is still mostly copy+paste):

a = 3 + x*x;
b = 4 + y*y;
c = a + b - 7 + 2*x*y;
d = c;
FullSimplify[c]

Output:

(x + y)^2

Which transforms back into the following C-code:

d = (x + y)
d = d * d;

This is obviously much more simple than the original code. In general, symbolic programs will even handle non-trivial expressions and will do just as well (or even better) than any compiler internal.

The final drawback is that symbolic math programs like Mathematica or Maple aren't free and are fairly expensive. SAGE is an open-sourced program, but I hear it is not as good as either Mathematica or Maple.

score 0 · Answer 8 · answered Aug 26 '11 at 20:54

0

If you're talking about C, you could look at the compiled, optimized assembly code. Then you could refactor your C code to the same structure as the optimized assembly. Like Alfredo said, though, that could lead to more ambiguous code.

answered Aug 26 '11 at 20:54

Daniel

6,595
9
38
70

3

Nothing like using register names instead of variable names to make the code more readable. – Ira Baxter Aug 26 '11 at 21:40

score 0 · Answer 9 · answered Aug 26 '11 at 20:57

0

Why not compile the code using an optimizing compiler. Then decompile the code. It is just my thought and I have not tried it out.

answered Aug 26 '11 at 20:57

emory

10,725
2
30
58

maybe you should look at the optimizing compiler output first. – Saideira Aug 26 '11 at 21:03
its probably easier just to write better code in the first place – Jodrell Aug 31 '11 at 14:34
@Saideira, I have no need to look at the output of an optimizing compiler. I assume it is horribly obsfuscated, but correct. If it is not correct, then you should get a new compiler. – emory Sep 01 '11 at 01:36
@Jodrell, I agree. I do not see why Saideira needs to do this "code simplification". I tend to view "code simplification" as counter productive to writing good code. – emory Sep 01 '11 at 01:40

simplifying code via refactoring

9 Answers9

Step 1: Constant propagation

Step 2: Slicing