6

Let's say that there are 2 possible architectures, ARM and x86. Is there a way to detect what system the code is running on, to achieve something like this from assembly/machine code?

if (isArm)
    jmp to arm machine code
if (isX86)
    jmp to x86 machine code

I know that ARM machine code differs from x86 machine code significantly. What I'm thinking about is some well crafted assembly instructions that would result in the same binary machine code.

Tibi
  • 4,015
  • 8
  • 40
  • 64
  • 2
    For starters, even the binary program format differs (architecture field in ELF header) so you won't be able to even run the program unless it's on the appropriate architecture. Theoretically it is possible to write such machine code, but you need a special way to run it. – Jester Jun 27 '16 at 13:56
  • I'm thinking it would run directly, without any OS behind it... like a bootloader. – Tibi Jun 27 '16 at 13:56
  • 1
    Yeah, then it can be done. – Jester Jun 27 '16 at 13:57
  • 1
    the bootstrap has to be for the right processor, then with that knowledge you know what processor you are and can just continue booting into the right binary. you cannot boot generic I dont know what I am code obviously. – old_timer Jun 27 '16 at 13:58
  • if you are trying to detect from the binary, arm does have some hits that make it look like arm (most instructions start with 0xE and are aligned, if arm and not thumb, thumb there are a lot of 0x6 and 0x7, but not enough to detect). Since you cannot boot this thing at all without already knowing what it is it sounds like the problem is solved. – old_timer Jun 27 '16 at 13:59
  • 3
    the magical Google term is "polyglot": https://www.google.co.uk/webhp?#q=arm+x86+polyglot – moonshadow Jun 27 '16 at 13:59
  • @dwelch I'm thinking of putting something like this in the first stage of the bootloader, so it detects the architecture, and then executes the correct code (ie load the correct second stage). – Tibi Jun 27 '16 at 14:02
  • 3
    If your code has no idea what machine it's running on, the ISA is almost the least of your worries; code that doesn't know where RAM is, what peripherals exist and where they are, etc. isn't going to achieve much. There might be some fairly standardised firmware interfaces on x86 PCs (note; not x86 in general), but good luck on the vast variety of ARM machines ;) – Notlikethat Jun 27 '16 at 14:03
  • 1
    if you are putting it in the bootloader then you already know what architecture it is running on from whatever launches the bootloader. leave breadcrumbs like atags are to linux, and as Notlikethat points out you have other things to worry about, at the minimum you might have two separate entry points depending on the binary even if you let the binary take care of worrying about what kind and how many and where the resources are. – old_timer Jun 27 '16 at 15:12
  • 4
    You're assuming that ARM-based computers boot the same way as x86 PCs, that they read the first sector of disk into memory and jump to it. However, as far as I know of, no ARM-based computer does this. Not even all x86 PCs do this anymore, as some use the much different EFI boot process exclusively. – Ross Ridge Jun 27 '16 at 17:44
  • Somebody found an injection hack and wants it to work on both PCs and cell phones? – evaitl Jul 16 '16 at 12:45

3 Answers3

19

Assuming you have already taken care of all other differences1 and you are left with writing a small polyglot trampoline, you can use these opcodes:

EB 02 00 EA

Which, when put at address 0, for ARM (non thumb), translates into:

00000000: b 0xbb4
00000004: ...

But for x86 (real mode), translates to:

0000:0000 jmp 04h
0000:0002 add dl, ch
0000:0004 ...

You can then put more elaborate x86 code at address 04h and ARM code at address 0bb4h.

Of course, when relocating the base address, make sure to relocate the jump targets too.


1 For example, ARM starts at address 0 while x86 starts at address 0fffffff0h, so you need a specific hardware/firmware support to abstract the boot address.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • 2
    If two architectures start at different addresses then there is no need to craft the binaries to be compatible. Simply load the ARM binary at ROM address 0 and the x86 binary at the appropriate ROM address. Won't really help if the OP wants to boot from thumb drive. – slebetman Jun 27 '16 at 14:30
  • 7
    @slebetman Yes, that's true. I took the question as "Can we make the same binary code be valid ARM and x86 code?". I have no idea what the OP is going to do or need. Just giving him what it asked for. – Margaret Bloom Jun 27 '16 at 14:40
  • 7
    Fun fact: within ARM, a 32/64-bit polyglot entrypoint is pretty straightforward, since an A32 `b` looks like an A64 `adds` (and yes, I have written code which (ab)uses that fact...) – Notlikethat Jun 27 '16 at 21:40
2

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0363g/Beijdcef.html

https://electronics.stackexchange.com/a/232934

How to setup ARM interrupt vector table branches in C or inline assembly?

http://osnet.cs.nchu.edu.tw/powpoint/Embedded94_1/Chapter%207%20ARM%20Exceptions.pdf

ARM Undefined Instruction error

ARM assembly is not my area of expertise, but I have programmed a lot in x86 assembly. I remember I had this same question as homework back in college. The solution I found was interrupt 06h (http://webpages.charter.net/danrollins/techhelp/0103.HTM , https://es.wikipedia.org/wiki/Llamada_de_interrupci%C3%B3n_del_BIOS#Tabla_de_interrupciones). This interrupt is fired everytime the microprocessor tries to execute an unknown instruction ("invalid opcode").

8086 gets stucked when an invalid opcode is found, because the IP (instruction pointer) returns to the same invalid instruction, where it tries to re-execute it, this loop stucks the execution of the program.

Starting with 80286 interrupt 06h is fired, so the programmer can handle the invalid opcode cases.

Interrupt 06h helps to detect the CPU architecture, by simply trying to execute an x64 opcode, if interrrupt 06h is fired, the CPU did not recognize it, so it is x86, otherwise it is x64.

This technique can be also used to detect the type of microprocessor :

  • Try to execute a 80286 instruction, if interrupt 06h is not fired, CPU is, at least, 8286.
  • Try to execute a 80386 instruction, if interrupt 06h is not fired, CPU is, at least, 8386.
  • And so on...

http://mtech.dk/thomsen/program/ioe.php

https://software.intel.com/en-us/articles/introduction-to-x64-assembly

Community
  • 1
  • 1
  • The question is about x86 vs ARM, not about identify specific chips (for which I suggest [this](http://www.drdobbs.com/database/cpuid-algorithm-wars/184410005) and [this](http://www.rcollins.org/ddj/Sep96/Sep96.html) and the Intel manuals for the x64 vs x86). Using the "#UD technique" is a chicken-egg problem: you need to identify the architecture to setup the ISR to identify the architecture. – Margaret Bloom Jun 27 '16 at 18:45
  • @MargaretBloom and Jose: detecting long mode is fundamentally different from detecting extensions supported within a single mode. Since it isn't backwards compatible, just write code that decodes differently, [e.g. use `REX jz` vs `inc eax`/`jz`](http://stackoverflow.com/questions/38063529/x86-32-x86-64-polyglot-machine-code-fragment-that-detects-64bit-mode-at-run-ti) to distinguish long mode from legacy/compat (or even 16bit) mode. – Peter Cordes Jun 27 '16 at 21:34
0

It's not possible in assembly or machine code because the machine code will depend on the architecture. So your if statement must first be compiled into either ARM or x86. If it compiled as ARM it cannot run on x86 without an emulator and if it compiled as x86 it cannot run on ARM without an emulator.

If you do run the code in an emulator than the code is basically running in a virtual version of the CPU it was compiled for. Depending on the emulator, you may or may not be able to detect that you are running on an emulator. And depending on the emulator, if the emulator allows your code to detect that you are running on an emulator you may not be able to detect the underlying CPU and/or OS (for example, you may not be able to detect if the x86 emulator is running on x86 or ARM).

Now, if you are very lucky, you may find two CPU architectures where the conditional branch or conditional goto instruction of one architecture does either something useful in your code or does nothing in the other architecture and vice versa. So if this is the case you can construct a binary executable that can run on two different CPU architectures.


How multi-architecture binary works in real life.

In real life, a multi architecture binary is actually two complete programs with shared resources (icons, images etc.) and the program binary format includes a header or preamble to tell the OS what CPUs are supported and where to find the main() function for each CPU.

One of the best historical examples I can think of of this is the Mac OS. The Mac changed CPUs twice: first from 68k to PowerPC then from PowerPC to x86. At each stage they had to come up with a file format that contained the binary executables of two CPU architectures.


Note on real-world executables

Real-life programs are almost never raw binary executable. The binary code are always contained in another format that contains metadata and resources. Windows for example uses the PE format and Linux uses ELF. But some OSes support more than one type of executable container (though the actually binary machine code can be the same). For example, Linux traditionally supports ELF, COFF and ECOFF.

slebetman
  • 109,858
  • 19
  • 140
  • 171
  • 1
    Another way to state your first paragraph is that the target architecture is a compile-time constant, so you could use CPP macros as your branch condition. – Peter Cordes Jun 27 '16 at 16:23
  • 2
    Your third paragraph directly contradicts your first paragraph :/ (not to mention the latter is demonstrably incorrect; most assemblers are happy to let you insert whatever arbitrary values you like into the instruction stream, e.g. [the `.inst` directive in GAS](https://www.sourceware.org/binutils/docs/as/ARM-Directives.html#ARM-Directives)). – Notlikethat Jun 27 '16 at 21:33