12

I'm working on a utility which needs to resolve hex addresses to a symbolic function name and source code line number within a binary. The utility will run on Linux on x86, though the binaries it analyzes will be for a MIPS-based embedded system. The MIPS binaries are in ELF format, using DWARF for the symbolic debugging information.

I'm currently planning to fork objdump, passing in a list of hex addresses and parsing the output to get function names and source line numbers. I have compiled an objdump with support for MIPS binaries, and it is working.

I'd prefer to have a package allowing me to look things up natively from the Python code without forking another process. I can find no mention of libdwarf, libelf, or libbfd on python.org, nor any mention of python on dwarfstd.org.

Is there a suitable module available somewhere?

VividD
  • 10,456
  • 6
  • 64
  • 111
DGentry
  • 16,111
  • 8
  • 50
  • 66

6 Answers6

9

You might be interested in the DWARF library from pydevtools:

>>> from bintools.dwarf import DWARF
>>> dwarf = DWARF('test/test')
>>> dwarf.get_loc_by_addr(0x8048475)
('/home/emilmont/Workspace/dbg/test/main.c', 36, 0)
Philippe Ombredanne
  • 2,017
  • 21
  • 36
emilmont
  • 783
  • 8
  • 5
5

Please check pyelftools - a new pure Python library meant to do this.

Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
4

You should give Construct a try. It is very useful to parse binary data into python objects.

There is even an example for the ELF32 file format.

Ber
  • 40,356
  • 16
  • 72
  • 88
  • I'm looking for something similar and checked out Construct. What's there is quite nice, but the project hasn't been updated in quite some time. – ctuffli May 05 '09 at 23:49
  • Just had a look at Construct, and it seems really terrific. Very impressed. – Craig McQueen Jul 09 '09 at 04:28
3

I've been developing a DWARF parser using Construct. Currently fairly rough, and parsing is slow. But I thought I should at least let you know. It may suit your needs, with a bit of work.

I've got the code in Mercurial, hosted at bitbucket:

Construct is a very interesting library. DWARF is a complex format (as I'm discovering) and pushes Construct to its limits I think.

Craig McQueen
  • 41,871
  • 30
  • 130
  • 181
  • Hi Craig, do you have any examples of how to use your DWARF parser? I've looked at your repo but couldn't find any. How could I do something like emilmont's dwarf.get_loc_by_addr() example? – Nick Toumpelis Dec 21 '11 at 18:51
  • @NickToumpelis, I haven't done any more work on this for a while, but I'm now just getting back to it since it could be useful at my work. I'm not entirely happy with the Construct-based solution, because it's slow to do the parsing. So, there's currently no high-level API as you requested. It gets as far as parsing the DWARF info into a tree. The next task would be to search the tree for the info you're looking for. DWARF format is so expressive, I'm not sure what would be a good simple API to access the data. – Craig McQueen Dec 27 '11 at 23:37
  • 1
    Craig: pyelftools (https://bitbucket.org/eliben/pyelftools) is built on top of `construct`, using it for the low-level API, but adding a feature-full high-level API on top – Eli Bendersky Jan 06 '12 at 07:12
3

I don't know of any, but if all else fails you could use ctypes to directly use libdwarf, libelf or libbfd.

Douglas Leeder
  • 52,368
  • 9
  • 94
  • 137
2

hachior is another library for parsing binary data

Brian
  • 31
  • 1