4

Is there a general way to check the platform (32-bit/64-bit) and architecture (powerpc, arm, etc...) of a binary file (Which can be ELF, Dwarf, PE, etc...)?

I know that almost every file (elf or pe) has a header which says which architecture can execute it but is there a generic way to get this information from all of the binary files?

I tried using magic which returns all of this info but it returns it in a string -

ELF 64-bit LSB executable, x86-64,ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld.so.0, stripped

Right now, I'm using a regex to extract the information out of it but I'm not sure if can always count on that because the string output of magic can be different.

Drxxd
  • 1,860
  • 14
  • 34
  • @amanb Look the code doesn't really matter because it specific for this elf... I'm asking what's the best way to do it generic (for all of the files and archs).. If there's some sort of library that already does it or something – Drxxd Mar 04 '19 at 12:07
  • That libmagic string looks like the output of the `file` shell command on Unix/Linux, but missing the `for GNU/Linux` part of the output that distinguishes a *BSD ELF executable from a Linux ELF executable, for example. – Peter Cordes Mar 04 '19 at 22:23
  • @PeterCordes Yeah the libmagic output should be the same as the "file" command output.. But I'm not sure if it's consistent in showing the architecture&platform or what the best way to retrieve it – Drxxd Mar 05 '19 at 09:26
  • I don't know python, I'm here for the CPU-architecture tag, sorry. But anyway, it's maybe not identical. e.g. `ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=0dd69c924e36ce85c93964d2364e645509dc82ea, stripped` for `file /bin/ls` on Arch Linux. Are you on OS X? I just noticed your ELF interpreter was `/lib/ld.so.0`, which on a normal multiarch Linux system doesn't exist. And `/lib/ld-linux.so.2` is 32-bit. So maybe libmagic on Linux would print what I got. – Peter Cordes Mar 05 '19 at 09:30
  • The `libmagic` internals are pretty ad hoc, it will populate some fields for some file types but there is not a system or principle behind what it can extract for any particular file type. Reverse-engineering your own detections out of the `libmagic` definitions would be my approach, though it sucks. The `libmagic` rules are not very hard to understand, it's similar to a `scanf` template more or less. – tripleee Mar 05 '19 at 09:34
  • @tripleee So you're suggesting reverse-engineering libmagic to get the proper functionality.. That sounds like too much work.. I'll probably stick to the regex :\ – Drxxd Mar 05 '19 at 09:39
  • @PeterCordes What's your point? Only the architecture and platform matter to me.. The question is how to get this data (architecture&platform) from a binary file generically? – Drxxd Mar 05 '19 at 09:40
  • Hrm, I don't think I have a useful point here, sorry. I was thinking you might be interested in some of the things that `file` printed but `libmagic` didn't, but apparently that's not the case. – Peter Cordes Mar 05 '19 at 09:43
  • Reverse-engineering the possible strings you need your regex to match is just a less precise and more error-prone way to do the same thing. – tripleee Mar 05 '19 at 09:44

0 Answers0