11

I've learned the basics about CPUs/ASM/C and don't understand why we need to compile C code differently for different OS targets. What the compiler does is create Assembler code that then gets assembled to binary machine code. The ASM code of course is different per CPU architecture (e.g. ARM) as the instruction set architecture is different.

But as Linux and Windows run on the same CPU, the machine operations like MOVE/ADD/... should be identical. While I do know that there are OS-specific functions like printing to a terminal, this functionality could be provided by different implementations of stdio.h, for example. And still, I could create a very basic program that just calculates a + b without printing anything, so that I do not need any OS-specific code. Why do I still need to compile for Linux and for Windows instead of just adding an .exe-Extension to my Linux executable?

Maxbit
  • 439
  • 5
  • 12
  • Because the executable formats are different, the shared libaries (or DLLs in Windows) are different, the libc implementation are different, etc. – Pablo Jan 13 '18 at 00:00
  • 2
    In addition to your computations every executable does lots of extra hidden work at startup and termination. This work is OS-specific. This hidden startup and termination code is what has to be generated separately for each OS. – AnT stands with Russia Jan 13 '18 at 00:02
  • But I do not understand why the executable formats are different. In the end, all executables are machine operations which don't differ from OS to OS, but from to architecture to archtitecure. Shared libraries are handled by the OS, so I just call functions like fopen and do not care which OS is handling the call (I imagine this like a microservice architecture). Do you understand my problem, @Pablo? – Maxbit Jan 13 '18 at 00:02
  • The executable formats are different because Microsoft wanted it that way--ask them. There's no reason they couldn't have used existing formats (though there would still be different implementations of the libraries). – Lee Daniel Crocker Jan 13 '18 at 00:04
  • The binary formats are different because they were design differently. Linux uses [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) which is different, Windows uses (as far as I know) [Portable Executable](https://en.wikipedia.org/wiki/Portable_Executable) which is different. The would use the same assembler instructions but not in the same way. – Pablo Jan 13 '18 at 00:06
  • each OS vendor may have also different calling convention, how arguments and results are passed to/from calls, so then the win vs linux implementation of `fopen` may be exactly same, but one will expect first argument in `rdi` and other in `rcx` or something like that, so you need different machine code to set up registers before call. Also the dynamic linking mechanism and much of other low level OS API is vastly different in details. The C++ standard will shield you from many differences by providing identical high level interface, but internal implementation is different, using dif. OS API. – Ped7g Jan 13 '18 at 00:11
  • You might be interested in [windows-subsystem-for-linux](https://stackoverflow.com/tags/windows-subsystem-for-linux/info) – Bo Persson Jan 13 '18 at 01:56
  • In theory, it would have been possible to design a common executable and library format so that a program written for the same framework (like Qt, GTK+ or just the STL) would compile to a single binary that runs on multiple OSes. In practice, nobody did. – Davislor Jan 13 '18 at 02:43
  • Thanks a lot guys! – Maxbit Jan 14 '18 at 11:43
  • Does this answer your question? [Why do you need to recompile C/C++ for each OS?](https://stackoverflow.com/questions/61644911/why-do-you-need-to-recompile-c-c-for-each-os) – phuclv Apr 09 '22 at 06:12

4 Answers4

26

Even though CPU is the same, there are still many differences:

  • Different executable formats.
  • Different calling conventions might be used. For example Windows x64 passes integer args in different registers than the x86-64 System V ABI and has several other significant differences, including call-preserved xmm6..15 in Windows, unlike other x86-64.
  • Different conventions regarding stack structure. Some systems have a concept of "red zone" to help compiler generate shorter code. Execution environment has to honor such concept to avoid stack corruption.
  • Programs are linked against different standard libraries with different ABIs - field order might differ, additional extension fields might be present.
  • In both C and C++ some data types have OS dependent sizes. For example on x86_64 long is 8 byte on Linux, but 4 bytes on Windows. (Type sizes and required alignments are another part of what makes an ABI, along with struct/class layout rules.)
  • Standard libraries can provide different set of functions. On Linux libc provide functions like snprintf directly, but on Windows snprintf might be implemented as static inline function in a header file that actually calls another function from C runtime. This is transparent for programmer, but generates different import list for executable.
  • Programs interact with OS in a different way: on Linux program might do system call directly as those are documented and are a part of provided interface, while on Windows they are not documented and programs should instead use provided functions.
  • Even if two OS rely on program doing system calls directly, each kernel has its own set of available system calls.

Even if a Linux program only calls the C library's wrapper functions, a Windows C library wouldn't have POSIX functions like read(), ioctl(), and mmap. Conversely, a Windows program might call VirtualAlloc which isn't available on Linux. (But programs that use OS-specific system calls, not just ISO C/C++ functions, aren't portable even at a source level; they need #ifdef to use Windows system calls only on Windows.)

  • Not OS related, but programs compiled by different compilers might not be interoperable: different standard libraries might be used, things like C++ name mangling might be different, making it impossible to link libraries against each other, C++ exception implementation might be non-interoperable.
  • Different filesystem structure. Not only there is a difference between "" on Windows and "/" on Unix-likes, but there are "special files" that might or might not be present like "/dev/null".

In theory everything listed here can be resolved: custom loaders can be written to support different executable formats, different conventions and interfaces do not cause problems if the whole program uses the same set of them. This is why projects like Wine can run Windows binaries on Linux. The problem is that Wine has to emulate functionality of Windows NT kernel on top of what other OSes provide, making implementation less efficient. Such program also have problems interacting with native programs as different non-interoperable interfaces are used.

Source-compatibility layers like Cygwin can be inefficient, too, when emulating POSIX system calls like fork() on top of the Windows model. But in general Cygwin has an easier job than WINE: programs need to be recompiled under Cygwin. It doesn't try to run native Linux binaries under Windows.

  • Thank you very much for the detailed list and especially the comment "In theory everything listed here can be resolved". I think I got it now. – Maxbit Jan 14 '18 at 11:43
  • Where did you learn all this? I'm kinda new to programming...I did read three books on c/cpp , none of this stuff was mentions. I also read a book on compilers; it was more concerned with theory and didn't mention any os specific details..I would love to read more about this but have no idea where to look... Thanks – Newbie And Curious Dec 24 '20 at 19:54
  • 1
    @NewbieAndCurious Just years of dealing with programming. When you try to write something complex that also needs to work on multiple OSes, differences become obvious. So any book that explains how specific OS works should be fine. For Windows I read "Windows via C/C++" by Jeffrey Richter and Christophe Nasarre which helped me a lot. For Linux there are man pages with syscall descriptions which can help with understanding of how the kernel works. –  Dec 24 '20 at 20:03
  • @StaceyGirl thanks, but are there any books that you'd recommend? that perhaps dive deep into some of these details (os specific calls, or c libraries implementations or even explain the concept of ABIs)? – Newbie And Curious Dec 24 '20 at 20:08
  • `Unix-like system have a concept of "red zone"` it has nothing to do with Unix. [Windows has a redzone on some platforms](https://devblogs.microsoft.com/oldnewthing/20190111-00/?p=100685). So are many other non-Unix platforms. And this just answers the differences between Windows and Linux so many differences between Linux, BSD and macOS for example aren't addressed (calling convention is the same, but different syscalls). Answers in [Why do you need to recompile C/C++ for each OS?](https://stackoverflow.com/q/61644911/995714) are much more complete – phuclv Apr 09 '22 at 06:11
  • @phuclv I have added part about syscalls being different from kernel to kernel. Never saw Windows on anything but x86 CPUs and so never compiler using the red zone, so I though that it might not have one. Removed mention of Unix systems anyway - it was redundant indeed. –  Apr 09 '22 at 11:51
2

In addition to everything else even with identical instructions even the calling conventions can differ, that is the placement of parameters on the stack or in registers, the order parameters are found, what registers must be preserved across a function call, how return values are passed from callee to caller.

SoronelHaetir
  • 14,104
  • 1
  • 12
  • 23
1

This is like saying if I use the same alphabet all books are the same a Biology text book and a Math text book are identical because they use the same alphabet have a cover have some pages, etc. Or I have to ski resorts and because they both use the same alphabet and because they both are about snow that their posters and brochures are identical.

int main ( void )
{
    return(27);
}

0000000000402cd0 <main>:
  402cd0:   48 83 ec 28             sub    $0x28,%rsp
  402cd4:   e8 d7 e9 ff ff          callq  4016b0 <__main>
  402cd9:   b8 1b 00 00 00          mov    $0x1b,%eax
  402cde:   48 83 c4 28             add    $0x28,%rsp
  402ce2:   c3                      retq   

  00000000004003e0 <main>:
  4003e0:   b8 1b 00 00 00          mov    $0x1b,%eax
  4003e5:   c3                      retq   

subtle differences sure, but the key is that these are two completely different operating systems, the entry/exit of the program (there is a TON of code not shown above that varies, not just the wee bitty spec if main code in this program.

These are different operating systems, they have different calls different rules, they are different, the instruction set being common is somewhat irrelevant. Its like saying because I am running on linux and using C as my programming language then a binary made for arm and a binary made for x86 should be identical and compatible (because two of the three things I said were the same, programming language and operating system but not instruction set. or in your case programming language and instruction set but not operating system.)

This goes so far as to point out that a gcc compiled program for windows is not completely compatible across all versions of windows, you cant just say "windows". same goes for linux. they change within themselves independent of target, then there are incompatible differences between the operating systems. Just because the brick and mortar are the same doesnt make two identical buildings.

This is the purpose of JAVA and Python and such languages, to draw a line everything above this line is common and cross platform, what is below this line can be platform and target specific and no reason to expect any form of cross platform compatibility. Those languages wouldnt exist if we had this kind of compatibility across the world of computers with C compilers or computers all running linux independent of platform or all running an operating system with a compiler and the same instruction set.

There is a reason when you go download some program like chrome or 7-zip or firefox, handbrake, etc there are different installers and/or binaries based on the operating system, and operating system version. The instruction set is often not even listed as it is assumed to be x86, yet there are different binaries, if it were this trivial then why would those folks who have delivered finished products for so long be delivering several different builds of the product?

old_timer
  • 69,149
  • 8
  • 89
  • 168
-1

You might want to check the other answers.

Its kind of duplicate question except that its for C not C++

You can check process of compilation steps here:

C compilation steps

In short , even though C is can be cross platform to run, due to compilers its not cross-platform compile-able.

Morse
  • 8,258
  • 7
  • 39
  • 64