1

I'm compiling the following code:

#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
    unsigned int i=atoi(argv[1]);
    unsigned int n=atoi(argv[2]);
    unsigned int r=0;
    unsigned int iter=0;

    while(i<n)
    {
        iter++;
        if (r>0) { i=0; r--; }
        else     { i++;      }
    }
    return iter;
}

with

$ clang -O3 -c -emit-llvm file.c -o file.bc
$ llvm-dis file.bc -o file.ll

And when I inspect the resulting *.ll file I'm pleasantly surprised, because the entire code shrinks down to:

; Function Attrs: nounwind uwtable
define dso_local i32 @main(i32 %argc, i8** nocapture readonly %argv) local_unnamed_addr #0 {
entry:
  %arrayidx = getelementptr inbounds i8*, i8** %argv, i64 1
  %0 = load i8*, i8** %arrayidx, align 8, !tbaa !2
  %call.i = tail call i64 @strtol(i8* nocapture nonnull %0, i8** null, i32 10) #2
  %conv.i = trunc i64 %call.i to i32
  %arrayidx1 = getelementptr inbounds i8*, i8** %argv, i64 2
  %1 = load i8*, i8** %arrayidx1, align 8, !tbaa !2
  %call.i12 = tail call i64 @strtol(i8* nocapture nonnull %1, i8** null, i32 10) #2
  %conv.i13 = trunc i64 %call.i12 to i32
  %2 = icmp ugt i32 %conv.i13, %conv.i
  %3 = select i1 %2, i32 %conv.i13, i32 %conv.i
  %4 = sub i32 %3, %conv.i
  ret i32 %4
}

Which essentially means: return (n>i)?(n-i):0 Is there anyway I can translate optimized bitcode back to C source?

OrenIshShalom
  • 5,974
  • 9
  • 37
  • 87
  • https://stackoverflow.com/a/29183452/5218277 – Alex Lop. May 20 '20 at 10:48
  • Why were you surprised? `iter` is never used and identical to `i` so `iter` is compiled out, `r > 0` is always `false`, so the whole `if (r>0) { i=0; r--; }` is compiled out. Thus you are left with `i=0; while (i < n) i++; return i;` which is `return (n>i)?(n-i):0` – Alex Lop. May 20 '20 at 10:53
  • @AlexLop. the entire loop was cancelled. – OrenIshShalom May 20 '20 at 10:57
  • Yes, sometimes the compilers do a great job. Your loop actually said, let's start incrementing `i` by 1 each loop iteration until it equals `n` ==> `n-i` – Alex Lop. May 20 '20 at 11:02
  • I don't know if there's a tool that can decompile .ll files, but you could try using [Ghidra](https://ghidra-sre.org/) and decompile the final binary instead. If you haven't stripped the symbols, the output should be quite readable. – Felix G May 20 '20 at 11:19
  • @FelixG the link is broken, and google-ing it didn't help. It looks like access to source codeis blocked or something. – OrenIshShalom May 20 '20 at 13:18
  • @OrenIshShalom that's strange, the link works perfectly fine for me. Maybe it's blocked in your country, because it's a reverse engineering tool? If you have access to GitHub you can try the [Ghidra repository](https://github.com/NationalSecurityAgency/ghidra) instead. – Felix G May 20 '20 at 13:44

1 Answers1

1

You can try llvm-cbe. Maybe it will be suitable for you.

Anyway, llvm-cbe was removed from llvm upstream, due it was not working good as I know :D

Maybe retdec is usable somehow, due it is based on llvm, but not sure.