How do C++ programs import files from ".a" files, but you can't reverse ".a" files yourself?

Question

If you include files packaged as ".a", you can import the files inside it and call their methods. However, from what I can tell, you can't retrieve the source code from the ".a" files. How is it possible that the program knows and can use the source code, but there are no programs to produce the source code in human readable form?

Then how can you call the function in the libraries if it doesn't contain the source code? — itsmarziparzi, Sep 02 '16 at 09:13
There are symbol tables kept in the .a file such the linker knows what to stitch together. — πάντα ῥεῖ, Sep 02 '16 at 09:15
Then how come you can't use that "linker" to stitch together and retrieve the source code? — itsmarziparzi, Sep 02 '16 at 09:16
See http://stackoverflow.com/questions/3322911/what-do-linkers-do — πάντα ῥεῖ, Sep 02 '16 at 09:17
So a linker has references that tell the program how to interpret what you call. But still it doesn't explain why you can't use it to reverse the .a file (with the .o files inside) into the original source code — itsmarziparzi, Sep 02 '16 at 09:21
For more or less the same reasons as you can't reverse a cake to its original ingredients. — πάντα ῥεῖ, Sep 02 '16 at 09:29

score 3 · Answer 1 · answered Sep 02 '16 at 11:48

The compiler translates source code into binary files known as object files that can be linked directly or stored in an archive (.a, .lib, etc.). The linker combines object files, which can be actual files or the contents of an archive, into an executable file. There is no magic in the archive files; the linker does not turn the binary files back into source code.

As to turning object files back into source files, there are tools to do that; they're known as decompilers, and they don't work very well, because there simply isn't enough information to reconstruct the source code. You can't get back to the pig after you've made sausage.

score 3 · Accepted Answer · answered Sep 02 '16 at 12:39

Ok even with all the explications about the linker you are still confused.

In very simple terms, the code that the CPU understands and executes is not the code you write in C++. It does the same thing, but it is not the same.

Assembly is a language between C++ (and other high level languages) and machine code (the code the CPU executes) that can still be understood by human eye, so I will use assembly here. Let's take two C++ functions and see their assembly (generated by clang with optimizations enable):

First function:

auto foo(int a, int b, int c)
{
  if (a < b)
    return b;
  if (a > c)
    return c;
  return a;
}

And the generated assembly:

foo(int, int, int):                              # @foo(int, int, int)
        cmpl    %edx, %edi
        cmovlel %edi, %edx
        cmpl    %esi, %edi
        cmovll  %esi, %edx
        movl    %edx, %eax
        retq

The second one:

auto clamp(int value, int interval_left, int interval_right)
{
  return std::max(interval_left, std::min(value, interval_right));
}

with it's generated assembly:

clamp(int, int, int):                            # @clamp(int, int, int)
        cmpl    %edi, %edx
        cmovlel %edx, %edi
        cmpl    %esi, %edi
        cmovll  %esi, %edi
        movl    %edi, %eax
        retq

What we can observe:

The 2 c++ functions do the exact same thing
The assembly looses a lot of information:
- the original name of the parameters
- heck even the information about how many variables were in the functions it's lost
- what kind of instruction flow the functions had.
- what kind of instructions the C++ code had
- that the second function actually had calls to other functions
- unless explicitly specified (-g), even the function names are lost in the final object files
- etc. etc.
the very different (in implementation) C++ functions generate identical assebly (well, nearly, take may word for it, it just differs the order the tests are carried)

So you can see there is no reliable way to reach to the original C++ code from the assembly (that is even more true from object files).

galinette · Answer 3 · 2016-09-02T13:05:01.567

The "program" in charge of assembling multiple .a files is called the linker. It does not have access to the source code inside the .a (and the .a does not even contain source code). It only knows the machine code of the functions, and the entry point of some of them, so that other linked code may call them.

Example with to .a files:

file1.a has somewhere a call to a function foo() which is not defined inside the file. The compiler knew at that point that the way to call this function would be defined later, and he let the call "dangling" inside the .a
file2.a has the compiled machine code for the function foo() and a table of symbols containing the necessary information for calling foo.

The linker merges the machine code of these two files, and it fills the undefined function call in the code from file1.a with a call to the entry point of foo() in file2.a. It has no access to the actual function implementations in c++ code.

Actually, if both .a use the same format, they don't even have to be written in the same language. The first one may be in Fortran, the second one in C++. The linker has no knowledge of these languages, he just knows about the .a format which is binary machine code, list of exported symbols and list of undefined symbols.

How do C++ programs import files from ".a" files, but you can't reverse ".a" files yourself?

3 Answers3

First function:

The second one:

What we can observe: