2

Is there any way by which we can identify that a .obj file and .exe file is 16/32 bit?

Basically I want to create a smart linker, that will automatically identify which linker do the given file names need to be passed to.

Preferred Language: C (it can be different, if needed)

I am looking for some solution that can read the bytes of an .exe/the code of an .obj file and then determine if it's 16/32 bit. Even an algorithm would too do.

Note: I know both object code and a executable are two different entities.

Squashman
  • 13,649
  • 5
  • 27
  • 36
  • Er, 16 bit? Surely you mean 32 vs 64 bit? – Shawn Feb 20 '20 at 14:57
  • Not really... @Shawn, but some knowledge on ways to differentiate between 32 and 64 bit obj/exe would be great. – Hello World Feb 20 '20 at 14:57
  • 16bit has been irrelevant outside of the embedded world for 25 years... – Shawn Feb 20 '20 at 14:58
  • 3
    True that...But that's my requirement for now. – Hello World Feb 20 '20 at 14:59
  • https://superuser.com/questions/358434/how-to-check-if-a-binary-is-32-or-64-bit-on-windows – Robert Harvey Feb 20 '20 at 14:59
  • https://stackoverflow.com/questions/495244/how-can-i-test-a-windows-dll-file-to-determine-if-it-is-32-bit-or-64-bit – Robert Harvey Feb 20 '20 at 15:00
  • Yes, I understand. None of my Google searches are coming up with what you asked for. See Shawn's comments. – Robert Harvey Feb 20 '20 at 15:00
  • Try this one: https://www.youtube.com/watch?v=PP7Wj4dqD7s – Robert Harvey Feb 20 '20 at 15:00
  • Actually I was looking for some way to programatically do it. Maybe some change in some header info of the exe... If I really had to do something like that I would have directly put the obj/exe in one of the linkers – Hello World Feb 20 '20 at 15:03
  • For an object file `z.o` you can do the following: "objdump -d z.o | grep -qE 'push\s+%bp' && echo 16-bit". Only a heuristical solution, but I doubt that a `push %bp` will occur in 32-bit code – Ctx Feb 20 '20 at 15:17
  • It may even be better to rule out 32/64-bit by doing a `grep -qE '%[er][abcd]x'`. – Ctx Feb 20 '20 at 15:23
  • @Ctx OP is programming on Windows. I believe `objdump` does not correctly support MZ executables, so this won't work. – fuz Feb 24 '20 at 08:11

1 Answers1

3

All of this information is encoded in the binary object according to the relevant Application Binary Interface (ABI).

The current Linux ABI is the Executable and Linkable Format (ELF), and you can query a specific binary file using a tool such as readelf or objdump.

The current Windows ABI is the Portable Executable (PE) format. I'm not familiar with the toolset here but a quick google search suggests there are programs that function the same as readelf:

http://www.pe-explorer.com/peexplorer-tour.htm

Here's the Microsoft specification of the PE format:

https://learn.microsoft.com/en-us/windows/win32/debug/pe-format

However, neither of those formats support 16-bit binaries anymore. The older ABI format is called "a.out" for Linux, which can be read and queried with objdump (I'm not sure about readelf). The older Windows/DOS formats are called MZ and NE. Again, I'm not familiar with the tool support for these older Windows formats.

Wikipedia has a pretty comprehensive list of all the popular executable file formats that have been used, with links to more info:

https://en.wikipedia.org/wiki/Comparison_of_executable_file_formats

David
  • 1,624
  • 11
  • 13
  • Sorry for being unclear. I have updated the question description. Thank you for your precious time. – Hello World Feb 20 '20 at 15:20
  • 2
    I'm not sure how this doesn't answer your question. For example, you can identify ELF flies because the first three characters are "ELF" in ASCII. You can identify MZ files because the first two charcters are "MZ" in ASCII. Etc. Parse the file formats to get what you need. – David Feb 20 '20 at 15:22
  • Does it start with 'E' 'L' 'F' or is it found after a specific offset? – Hello World Feb 20 '20 at 15:33
  • I can't really find any difference between the two based on your classification... 0x`4d5a` for a exe is constant and others change irrespective of 16/32 specification. Could I please have some of your time @David ? – Hello World Feb 20 '20 at 15:49
  • Re "*The older Windows/DOS formats are called ME and NZ*", Don't you mean [MZ](https://en.wikipedia.org/wiki/DOS_MZ_executable) and [NE](https://en.wikipedia.org/wiki/MS-DOS_4.0_(multitasking))? – ikegami Feb 20 '20 at 17:11
  • @Hellow World, 4D = M, 5A = Z. So you have [MZ](https://en.wikipedia.org/wiki/DOS_MZ_executable) files. Aren't those exclusively 16-bit? – ikegami Feb 20 '20 at 17:19
  • @HelloWorld The ELF format always starts with the four bytes 0x7F 'E' 'L' 'F'. You can identify this if you open up a binary executable file in a text editor like vi or emacs. Just finding these is enough to tell you that this is either a 32-bit or 64-bit file, because those are the only ones supported by ELF. The fifth byte, by the way, is the ELF class, which says explicitly whether this is a 32-bit or 64-bit object: https://refspecs.linuxbase.org/elf/gabi4+/ch4.eheader.html#elfid – David Feb 20 '20 at 19:16
  • I am unfortunately unfamiliar with the Microsoft file formats (which it sounds like where you're working), but this should just be a matter of looking up the reference documents. Any ABI file will have some method of identification, and some method to describe what architecture the file is designed for. – David Feb 20 '20 at 19:22
  • Even my `excel.exe` starts with `4d5a 9000 0300 0000 0400 0000 ffff 0000` – Hello World Feb 20 '20 at 19:32
  • This is the first line of a random (asm->)exe, I had written long ago. `4d5a 2300 0200 0100 2000 0100 ffff 0000` – Hello World Feb 20 '20 at 19:33
  • I can't really make out the difference...Is it at some specific offset? – Hello World Feb 20 '20 at 19:35
  • The modern PE file format, which I linked above (https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#file-headers) starts with an old MZ style header, so both modern 32 and 64-bit applications will start with MZ. In order to determine the difference you actually have to go further and keep parsing the file. In this case, it sounds like looking for the "PE" characters at offset 0x3C then differentiates between 16-bit (MZ) and 32/64-bit (PE) files. (https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#signature-image-only) – David Feb 20 '20 at 19:44
  • However, other sources have said that there are extensions to MZ that allow the use of 32-bit binaries in the MZ file format, so that test might not be conclusive. What you need to do now is go find those references and do a bunch of reading to convince yourself what the correct approach is. – David Feb 20 '20 at 19:46