What does a function look like in memory?

Question

I'm starting to learn C, and I just learned about pointers to functions. This made me wonder, what does a function look like in memory? I mean, (correct me if I'm wrong) an int variable for example is just a chunk of 2/4 bytes, storing a certain whole number. But what does a function look like in memory? How do you convert operations such as if, while, malloc, etc into numbers or binary?

Take a look at any assembly language and how that translates to raw 0's and 1's that are interpreted by the CPU. — DeiDei, Jul 03 '23 at 17:35
You're asking a fairly broad question there. Consider a few factors implicit in your question: 1.) How are argument values passed to the function? By value or by reference? 2.) How does the function know where to send the instruction pointer back to when it "returns"? Maybe you want to try to narrow your question a bit? — Onorio Catenacci, Jul 03 '23 at 17:35
*How do you convert operations such as if, while, malloc, etc into numbers or binary?* That's the job of the compiler (and linker). — Weather Vane, Jul 03 '23 at 17:41
A function is *also* a chunk of bytes, but instead holding a value they hold instructions for the CPU. — BoP, Jul 03 '23 at 17:59
See [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) and https://godbolt.org/, especially the "binary" mode to show disassembly of the machine code instructions. Normal CPUs are von Neumann stored-program computers where the program is bytes in memory. (And at a lower level than that, see [How does an assembly instruction turn into voltage changes on the CPU?](https://stackoverflow.com/q/3706022) re: asm turning into machine code and what happens in logic gates when running machine code, like how computers work.) — Peter Cordes, Jul 04 '23 at 20:17

score 4 · Accepted Answer · answered Jul 03 '23 at 20:12

What does a function look like in memory?

A function is a number of consecutive bytes. For instance in hex a function could be:

A2 32 67 98 34 33 12 23 22 55

These (hex) numbers represents instruction that the hardware, i.e. the CPU, can execute. The CPU decodes the values and finds out what it is going to do. In other words - which instructions to execute.

Exactly how these numbers translate into instructions differs from CPU family to CPU family. Sometimes a single byte is a whole instruction. Sometimes it takes a number of bytes to make an instruction. To figure out how to translate the bytes into instructions, you need to consult the instruction set of the CPU being used.

In some cases instructions are pretty simple. For instance "ADD register 1 and register 2", "JUMP-IF-ZERO register 3, address", "DECREMENT register 5", "LOAD register 1 from a memory address", "STORE register 1 to a memory address" and so on. But you can also find CPUs with pretty complex instructions.

? How do you convert operations such as if, while,

By using a number of the simple instructions from the instruction set.

Example: With the simple instructions I made up above and assuming that x is integer located at address 0x100 in memory, the C code:

x = x - 1;
if (x != 0)
{
    x = x - 1;
}

could be something like

    LOAD reg1, 0x100  // Read x from memory address 0x100 into register 1
    DECREMENT reg1
    JUMP-IF-ZERO reg1, LABEL1
    DECREMENT reg1
LABEL1:
    STORE reg1, 0x100 // Write x back to memory

Notice that a specific C code often can be made in several ways on a specific CPU. Different compilers may generate a different sequence of instruction for the same C code.

... different compilers, but also the same compiler with different compilation switches/options. — Clifford, Jul 03 '23 at 23:59

Clifford · Answer 2 · 2023-07-04T18:03:16.800

The answer to your question is machine code.

Run your compiled code in a source-level debugger and switch on the "disassembly view" to see exactly how the source is translated to machine code (typically presented as a "disassembly" which is a human readable representation of the machine code - "assembly code"). Or use Compiler Explorer which presents the assembly code generated for many different compilers for various architectures and machine instruction sets. For example Compiler Explorer shows that the following C source code:

int fn(void)
{
    return 11 ;
}

int main() 
{
    int (*fn_ptr)(void) = fn ;
    int val = fn_ptr() ;
    if( val > 10 )
    {
        val = 10 ;
    }    

    return 0 ;
}

using MSVC v19 x64 compilation translates to:

fn      PROC
        mov     eax, 11
        ret     0
fn      ENDP

val$ = 32
fn_ptr$ = 40
main    PROC
$LN4:
        sub     rsp, 56                             ; 00000038H
        lea     rax, OFFSET FLAT:fn
        mov     QWORD PTR fn_ptr$[rsp], rax
        call    QWORD PTR fn_ptr$[rsp]
        mov     DWORD PTR val$[rsp], eax
        cmp     DWORD PTR val$[rsp], 10
        jle     SHORT $LN2@main
        mov     DWORD PTR val$[rsp], 10
$LN2@main:
        xor     eax, eax
        add     rsp, 56                             ; 00000038H
        ret     0
main    ENDP

Each assembly line represents a single machine language instruction (with the exception of some "pseudo-ops" such as ENDP and PROC which are directives to the assembler tool), and has a corresponding binary representation in memory. For example the machine code corresponding to mov eax,11 in the function fn() is:

b8 0b 00 00 00  // Load (move) the literal value 0x0000000B to register EAX

where b8 is the opcode for MOV in hexadecimal (base 16) notation, and 0b 00 00 00 is the operand 11 as a 32-bit little-endian (least significant byte first) integer. Note that hexadecimal is commonly used to present binary data because a single hex digit corresponds to exactly 4 binary digits (bits), and reading long strings of 1's and 0's is impractical. For example b8 0b 00 00 00 in binary, hexadecimal and decimal is:

1011 1000  0000 1011  0000 0000  0000 0000  0000 0000  // Binary
   B 8        0 B        0 0        0 0        0 0     // Hex (nibbles)
   184         11         0          0          0      // Decimal

See https://www.cs.uaf.edu/2016/fall/cs301/lecture/09_28_machinecode.html for additional examples of x86 machine code and corresponding assembly.

As I mentioned at the start, you can use a source-level debugger to directly inspect the relationship between source code and the machine code translation. For example the following is an example in VS Code showing the source (C++ code in this case, but the principle is the same), and the corresponding assembly code and its corresponding machine code in hexadecimal:

^{[source: https://user-images.githubusercontent.com/80216624/115053737-6a850280-9f1a-11eb-8505-a7c5570af71a.gif]}

See how the cursor in the source matches the instruction at the cursor in the disassembly. Not that it is not always that straightforward in optimised code, where there is often a less direct correspondence between source and machine code (and also why source-level debugging is normally performed on un-optimised code).

Note also that each instruction has an address. The address 0x00000001400BDF1B in this case is the address of the function func(), so would be the value of a pointer-to-func() (i.e. the value of func).

Another example from Visual Studio 2022 using the same code as the Compiler Explorer example above: Note that the code generation differs from that in Compiler Explorer - different compiler version, different compiler settings. Also in this example the generating source is interspersed, with line number references so you can see exactly how each line is translated - noting that this generally only makes sense for unoptimized code use in debugging.

Further reading on how computers work at the lowest possible level: Code The Hidden Language of Computer Hardware and Software

score 2 · Answer 3 · answered Jul 03 '23 at 17:41

2

C source code is compiled into assembly instructions which are mostly quite basic things like "Load 8 bytes from memory" or "Add two integers." Those human-readable assembly instructions are then assembled into machine code, which is a compact binary encoding of the assembly. The two steps may also be combined into one, but most compilers will let you do them separately if you wish (so you can see read the assembly code).

Machine code is specific to the microarchitecture. Many machines will have an opcode followed by zero to three operands for typical instructions. The opcode is a somewhat arbitrary number which means something like "Add" or "Load." The opcode encoding may be fixed length (always one byte, so you can encode 256 operations) or variable length (one byte for simple operations, two bytes for complex or esoteric ones).

Ref:

answered Jul 03 '23 at 17:41

John Zwinck

239,568
38
324
436

Re “… opcode followed by… operands” and “one byte for simple operations, two bytes for complex”: Contrasted with ARM encoding where an instruction may have a few bits selecting some instruction subset, then a few bits for part of an operand, then bits for another operand, then bits for an instruction in the subset, then bits for another operand, and so on. At one time, I had to decode those manually. Argh. – Eric Postpischil Jul 03 '23 at 17:58
*The opcode is a somewhat arbitrary number*. 80x86 opcodes also contain bit fields, as Eric's comment. – Weather Vane Jul 03 '23 at 18:58
_"...compact binary encoding of the assembly."_ That is somewhat misleading perhaps. Assembly is a human readable representation of the machine code. i.e. the assembly _represents_ the machine code, rather then the machine code being an encoding of the assembly. While a compiler can output assembly for human readability of the generated code, assembly is not a necessary intermediate step of compilation - the machine code can be generated directly. – Clifford Jul 03 '23 at 23:23
Op codes are not really arbitrary. For example consider https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/ARM-Instruction-Set-Encoding/ARM-instruction-set-encoding - (as Eric mentioned) ARM op-codes have a defined _stucture_ with semantics applied to specific bits. – Clifford Jul 03 '23 at 23:27

What does a function look like in memory?

3 Answers3