1

This was asked during an interview.

One of my answers was Endianes (though this will just tell if the underlying CPU is little or big Endian and not the ISA type).

I cannot embed assembly code or CPUID sort of thing.

I was thinking about memory model as ARM is weakly ordered and x86 is TSO. But I cannot think of a C++ program that will help me differentiate this.

user3219492
  • 774
  • 1
  • 10
  • 18
  • 1
    Normally you'd use C preprocessor macros since for a normal C implementation, the target ISA is known at compile time. Or ask the OS via something like `system("uname -a")`. If you want to check stuff at run time, most C implementations for most ISAs allow casting a function pointer to `char*`, so you could check machine code for some common start-of-function patterns like x86 `push`. But that's not reliable since the first instruction could be almost anything in an optimized program. Maybe have a simple test function that will reliably compile to an `add` or `lea` and `ret`. – Peter Cordes Jun 08 '23 at 06:01
  • Endianness will let you detect a PDP-11; I don't think many ISAs are PDP-endian or other mixed (https://en.wikipedia.org/wiki/Endianness#Middle-endian) - little-endian 16-bit words, but 32-bit data as a big-endian pair of little-endian words. – Peter Cordes Jun 08 '23 at 06:03
  • @PeterCordes thanks for the idea about writing a function that reliably will have add as first instruction and check the value stored in the address of the pointer to the function ans make sure it's opcode is same as in x86 or ARM manual - interesting – user3219492 Jun 08 '23 at 06:05
  • 3
    ARM mostly run on little endian nowadays so it's just the same as x86. You have to use macros because a compiled program will only run on that specific platform. [Detecting CPU architecture compile-time](https://stackoverflow.com/q/152016/995714). Many OSes have an emulator for other platform(s) so it's possible to detect the architecture but obviously that's no efficient – phuclv Jun 08 '23 at 06:11
  • Oh, I missed that there were only two options, x86 and ARM. That narrows it down a lot, so reading machine code could work since they're both Von Neumann architectures and probably don't overlap in their machine-code bit-patterns for `int foo(int a, int b){ return a+b; }` where x86-64 will use LEA, ARM will use `add r0, r1`. https://godbolt.org/z/a59sdcEsY – Peter Cordes Jun 08 '23 at 06:17
  • If you don't mind using implementation / ABI differences that happen to be true, mainstream C implementations (like GCC and clang which follow the standard ABI/calling convention) for x86 have signed `char`, but on ARM the standard ABI use unsigned `char`. So `if ((int)(char)-1 == 0x000000ff) it's arm;` – Peter Cordes Jun 08 '23 at 06:37
  • 2
    Dont assume the interviewer knew either they may have discovered a trick or maybe their trick only works with the compiler they are using, or maybe it doesnt actually work. Or maybe they cannot solve it and are trying to get you to do their job. – old_timer Jun 08 '23 at 12:48
  • re: memory model, you'd want to look for a litmus test for LoadLoad or StoreStore reordering. Those are probably the easiest (easier than LoadStore), since you just have one thread doing two stores (with `memory_order_relaxed`) to separate cache lines, the other thread doing two loads (again with `relaxed`). Check that you see *both* possible orders, otherwise compile-time reordering could explain it. – Peter Cordes Jun 08 '23 at 14:38
  • But to repeat the test without a lot of additional synchronization to "start fresh", maybe have on thread storing an array (of `atomic` with `relaxed`), and storing an index to another var, which the reader uses, and look for violations of acquire/release semantics which happen for free on x86 even with `relaxed`. But no, that would also always work on ARM because `relaxed` still works like `consume` (except in tricky cases where optimization makes asm without a data dependency), and the load address would have a data dependency on the index. – Peter Cordes Jun 08 '23 at 14:38
  • e.g. Linux RCU basically uses `relaxed` loads to get the asm they want. So you couldn't observe LoadLoad reordering that way. But actually you still could observe StoreStore. To make it more likely, you'd want the store to miss in cache (maybe all the way to DRAM) but the index to hit. So maybe have the reader spin a bit between pairs of reads so it's not hammering on the cache line holding the index. – Peter Cordes Jun 08 '23 at 14:40
  • @PeterCordes do you mind compiling your answer so that I can upvote it. Thanks for the suggestion about std::memory_order_.. I was not aware that C++ was supporting it. Trying to wrap my head around your answer – user3219492 Jun 08 '23 at 23:06
  • 1
    https://preshing.com/20121019/this-is-why-they-call-it-a-weakly-ordered-cpu/ has a demo of a somewhat different algorithm that breaks on weakly-ordered ISAs when using `relaxed` instead of `acquire`/`release`, rolling your own lock. I thought about writing up an answer, but it's not clear what kind of context this would ever be useful in, since you can just `#ifdef`. IDK what kinds of things you're "allowed" to look for, like ABI differences such as `char` being signed aren't fundamental to the ISA, but are an important different in the software ecosystem. – Peter Cordes Jun 08 '23 at 23:22
  • @PeterCordes thanks a lot for the blog. Makes a lot of sense now. I see your point. Thanks for your help though – user3219492 Jun 09 '23 at 03:51
  • You may find [litmus7](http://diy.inria.fr/) useful for generating litmus tests that stress test an architecture's memory model. It isn't technically an answer to your question though since the tool generates assembly snippets. – hayesti Jun 09 '23 at 08:44
  • 1
    One might be able to use the fact that `ret` in x86 is 1 byte and 4 bytes in arm64 -- the alignment of the functions would need to be controlled to 1 byte. – Aki Suihkonen Jun 19 '23 at 12:52
  • I suppose you could cast a function pointer as a `char`, dereference it, and compare the result to the bytecode of `push ebp` vs. `push {r4,fp,ip,lr}` or whatever ARM does in the function prologue. But it's not exactly an elegant solution. – puppydrum64 Jun 30 '23 at 11:18

0 Answers0