31

I would like to translate X86_64, x86, ARM executables into LLVM IR (disassembly).

What solution do you suggest ?

Grzegorz Wierzowiecki
  • 10,545
  • 9
  • 50
  • 88

6 Answers6

16

mcsema is a production-quality binary lifter. It takes x86 and x86-64 and statically "lifts" it to LLVM IR. It's actively maintained, BSD licensed, and has extensive tests and documentation.

https://github.com/trailofbits/mcsema

Dan
  • 1,721
  • 3
  • 16
  • 20
11

Consider using RevGen tool developed within the S2E project. It allows converting x86 binaries to LLVM IR. The source code could be checked out from Revgen branch of GIT repository available by url https://dslabgit.epfl.ch/git/s2e/s2e.git.

bsa2000
  • 382
  • 2
  • 11
  • 1
    I see you've [mentioned here](http://stackoverflow.com/a/9059978/544721) another [paper](http://infoscience.epfl.ch/record/149975/files/x86-llvm-translator-chipounov_2.pdf) related with x86 -> LLVM translation. Thanks for great references. – Grzegorz Wierzowiecki Jan 31 '12 at 09:54
  • I have problems with links provided. `git clone https://dslabgit.epfl.ch/git/s2e/s2e.git` can not clone :/. – Grzegorz Wierzowiecki Jan 31 '12 at 09:55
  • I haven't got any problems on Ubuntu 10.10. Make sure you have git installed and correct firewall/proxy settings. Also you may find some related documentation on project's web site _https://s2e.epfl.ch/embedded/s2e/index.html_ – bsa2000 Jan 31 '12 at 13:12
  • Ok, now it works. Probably it was temporary network problem or short server downtime. I'd love to take a look at RevGen asap :). – Grzegorz Wierzowiecki Jan 31 '12 at 17:11
  • Grzegorz Wierzowiecki, is here an ready-to-use converted from x86 binary into llvm ir? What part of s2e.git is the revgen itself? – osgx Feb 20 '12 at 15:12
  • 2
    Hi osgx, you can find it at **/tools/tools/static-translator**. – bsa2000 Feb 29 '12 at 09:19
10

As regards to RevGen tool mentioned by @bsa2000, this latest paper "A compiler level intermediate representation based binary analysis and rewriting system" has pointed out some limitations in S2E and Revinc.

I pull them out here.

  1. shortcoming of dynamic translation:

    S2E [16] and Revnic [14] present a method for dynamically translating x86 to LLVM using QEMU. Unlike our approach, these methods convert blocks of code to LLVM on the fly which limits the application of LLVM analyses to only one block at a time.

  2. IR incomplete:

    Revnic [14] and RevGen [15] recover an IR by merging the translated blocks, but the recovered IR is incomplete and is only valid for current execution; consequently, various whole program analyses will provide incomplete information.

  3. no abstract stack or promoting information

    Further, the translated code retains all the assumptions of the original bi- nary about the stack layout. They do not provide any methods for obtaining an abstract stack or promoting memory locations to symbols, which are essential for the application of several source-level analyses.

HackNone
  • 504
  • 6
  • 12
2

I doubt there will be universal solution (think about indirect branches, etc.), LLVM IR is much "higher level" than any assembler. Though it's possible to translate on per-BB basis. You might want to check llvm-qemu and libcpu projects among others.

Anton Korobeynikov
  • 9,074
  • 25
  • 28
  • 3
    LLVM is able to capture high level information, IMHO it is not required. I believe there can exist solution - maybe not one universal approach, but still. Thanks for great references : llvm-qemu and libcpu looks interesting. :) – Grzegorz Wierzowiecki Aug 14 '11 at 15:02
  • 1
    Btw. If there [is possible LLVM to Javascript trsnalation and it's actually implemented](http://stackoverflow.com/questions/7295922/what-platform-can-i-compile-binaries-for-using-llvm-low-level-virtual-machine/7325817#7325817), another assembly is possible as well ;). Question, you and when will do it :). – Grzegorz Wierzowiecki Sep 06 '11 at 20:37
1

Just post some references on translating ARM binary to LLVM IR:

disarm - arm binary to llvm ir disassembler

https://code.google.com/p/disarm/

However, I have not tried it, thus not sure about its quality and stability. Anyone else may post additional information about this project?

HackNone
  • 504
  • 6
  • 12
1

There is new project, being in some early phases, The libbeauty: https://github.com/jcdutton/libbeauty

Article about project: Libbeauty: Another Reverse-Engineering Tool, 24 December 2013, Michael Larabel - http://www.phoronix.com/scan.php?page=news_item&px=MTU1MTU

It only supports subset of x86_64 as input now. One of the project goals - is to be able to compile the generated LLVM IR back to assembly to get the binary with same functionality.

osgx
  • 90,338
  • 53
  • 357
  • 513