18

Is it possible to run LLVM compiler with input of x86 32bit code? There is a huge algorithm which I have no source code and I want to make it run faster on the same hardware. Can I translate it from x86 back to x86 with optimizations.

This Code runs a long time, so I want to do static recompilation of it. Also, I can do a runtime profile of it and give to LLVM hints, which branches are more probable.

The original Code is written for x86 + x87, and uses no SSE/MMX/SSE2. After recompilation It has chances to use x86_64 and/or SSE3. Also, the code will be regenerated in more optimal way to hardware decoder.

Thanks.

osgx
  • 90,338
  • 53
  • 357
  • 513
  • 1
    This is not an answer, but I remember there were programs for Amiga to "optimize" code compiled for MC68000 to make it work faster on newer processors, at the cost of compatibility. But I know of no such attempt for x86. – liori Jan 08 '11 at 22:44
  • IMO, you might have a better time using something like IDA & hex-rays or Ollydbg to reverse engineer the assembly back into a higher level language (C or C++) – Necrolis Nov 27 '11 at 20:17
  • 1
    [RevGen](http://stackoverflow.com/questions/9359487/the-source-code-of-revgen-tool) is one of x86->LLVM translators here. It also has translator from x86 to static binary. It uses Qemu and modified MIPS TCG, which generates IR. – osgx Mar 28 '12 at 00:00
  • 1
    There is also http://dagger.repzret.org/ - Dagger, which [can decompile](http://llvm.org/devmtg/2013-04/bougacha-slides.pdf) to LLVM IR. – osgx May 07 '13 at 13:48
  • @osgx I am afraid dagger can never release.... – lllllllllllll Jan 12 '14 at 02:36
  • 1
    A common keyword nowadays is "assembly lifting": https://github.com/trailofbits/mcsema | https://github.com/zneak/fcd/ | https://reverseengineering.stackexchange.com/questions/12460/lifting-up-binaries-of-any-arch-into-an-intermediate-language-for-static-analysi – Ciro Santilli OurBigBook.com May 01 '18 at 23:32

3 Answers3

13

LLVM can't do this out of the box. You'd have to write an x86 binary to LLVM intermediate representation (IR) converter. That would be a very non-trivial task. If the x86 code was simple enough it might map pretty closely to IR, but some x86 instructions won't map directly, e.g. stack pointer manipulations.

Edit: You could also consider trying an approach similar to what QEMU does. QEMU translates the binaries on the fly, that it when I run PowerPC code, each basic block is translated into X86 code before it is executed. You could figure out how to break your object file into the basic blocks and generate LLVM IR for each block, discarding stuff (like parameter passing, etc.) and replacing that with straight LLVM IR.

Still a BIG job, though. Probably easier to rewrite the algorithm from scratch.

This exact process is described in "Dynamically Translating x86 to LLVM using QEMU"

user52571
  • 3
  • 2
Richard Pennington
  • 19,673
  • 4
  • 43
  • 72
  • Is there any projects to do same? – osgx Jan 08 '11 at 22:49
  • Not likely. There's simply not enough information left in the machine code for LLVM's optimizer to work with. The code would have to be reverse-engineered back to a high-level representation before it could be usefully vectorized and recompiled for 64bit, and compilers just aren't that good at making inferences. You might be able to use an x86 emulator that does dynamic recompilation, but it's not likely to be able to vectorize the math, and the overhead would negate any performance gains. – user57368 Jan 08 '11 at 22:58
  • For overhead: There is a some code, which I want to run fast. I can spend 1 hour of running optimizer before I will run the new code. The goal - is to get faster code from slower time. Process of recompiling is to be done statically, one time. – osgx Jan 08 '11 at 23:11
  • and there is a old project by HP http://personals.ac.upc.edu/vmoya/docs/bala.pdf which does a dynamic recompilation of native machine code to make it faster. – osgx Jan 19 '11 at 18:33
  • 3
    To the best of my knowledge there are no such projects, but at one point there was a project to use LLVM to JIT compile code for QEMU (http://code.google.com/p/llvm-qemu/), which is closely related. – Daniel Dunbar Jan 23 '11 at 17:15
1

The MAO project seems to do part of what you want (x86->intermediate language).

edit: @osgx, you'll need to look at the mao website for the project status and details of what programs they can handle. (Self-modifying code might be challenging though.)

akavel
  • 4,789
  • 1
  • 35
  • 66
mwb
  • 11
  • 1
  • 1
    Hi. What a status of MAO? Which part of x86/x86_64 can it handle? Can It work with self-modified code (UPX packed, e.g) – osgx Nov 27 '11 at 22:10
0

From what I know, disassembling x86 code 100% correctly is impossible. As data and code is intermingled and also due to variable length instructions. The only way to properly disassemble is to interpret it on the fly.

pythonic
  • 20,589
  • 43
  • 136
  • 219
  • interpreting is needed only for self-modifying code. Static code can be disassembled easily (with any disassembler). Working with dynamic code is possible only if there will be a recompiler at runtime OR if dynamic code can be unpacked into static code (in my case EXE packed just like UPX is used and it can be unpacked) – osgx Apr 29 '12 at 03:15
  • @osgx: it's not true. For example, desynchronization techniques can easily confuse disassemblers. – molnarg Jan 14 '14 at 08:52
  • 1
    Well, technically that is true, but nothing worth engineering is ever 100% possible. so in theory, 100%? never possible... in practice, 99.98% very possible... in fact it's well documented on how you overcome the theoretical limitations and produce valuable output. – J. M. Becker Feb 16 '15 at 18:29