Linux Kernel - why a function's address in System.map is one byte preceding its address as seen in real time?

Question

In linux kernel source code, added this lines in tasklet_action code:

printk("tasklet_action = %p\n" , *tasklet_action);
printk("tasklet_action = %p\n" , &tasklet_action);
printk("tasklet_action = %p\n" , tasklet_action);

In the output I get:

tasklet_action = c03441a1
tasklet_action = c03441a1
tasklet_action = c03441a1

But when searching it in the system.map file the tasklet_action address is at c03441a0 so there is an offset of 1 byte.

Why do I have this offset?
Is it always an one byte offset?

The number 1234567 is the real result, or a mock-up number for illustrative purposes? — user1284631, Jan 02 '13 at 11:26
My first guess would be that you aren't looking at the System.map file which correspond to the running kernel. — AProgrammer, Jan 02 '13 at 13:59
Not sure why you get the same output for the three different lines of code. — Maxim Egorushkin, Jan 02 '13 at 14:11
@MaximYegorushkin: at least for the last two LOCs, this is explainable: the same would have happen if tasklest_action would have been defined as an array: char tasklet_action[10]; normally, only tasklet_action should mean the address, but compilers (gcc) usually give &tasklet_action the same value (the address); a bit unsure why, still (as &tasklet_action should bear no more meaning than &3.0)... as for the dereferenced tasklet_action, indeed it is a bit strange. — user1284631, Jan 02 '13 at 14:33
@axeoth: An expression of function type (such as a function name) decays to a pointer to the function in most contexts, but not when it's an operand of unary `&`. So `func`, `&func`, and `*func`, as well as `**func` and `***func`, all have the same type and value, namely the address of `func`. There's nothing compiler-specific about it; the C language defines it that way. But the OP's question is why the address he's seeing doesn't match the one in `System.map`. — Keith Thompson, Jan 02 '13 at 16:10

score 18 · Accepted Answer · answered Jan 02 '13 at 15:55

18

My guess is that you are running on ARM in Thumb mode, or on some other architecture that uses the bottom bit of the function pointer to indicate which mode to run in.

If so, the answer is that your function really is located at the address in the system.map.

The value you get at run time is the location and the mode.

Instructions, on these kinds of architectures, always must be 2- or 4-byte aligned, which would leave the bottom bit always zero. When the architecture grew an extra mode the designers made use of the 'wasted' bit to encode the mode. It's clever, but confusing, and not just for you: a lot of software, like debuggers, broke in many nasty ways when this was first invented.

The concept is particularly confusing for x86 programmers who are used to variable-length instructions with any random alignment.

answered Jan 02 '13 at 15:55

ams

24,923
4
54
75

1

(+1) Out of interest, what exactly is the "mode" encoded by the bottom bit? – NPE Jan 02 '13 at 16:49
1

It's ARM mode vs. Thumb mode. – ams Jan 03 '13 at 16:40
3

Originally there was only ARM mode, then a whole different instruction set was invented called "Thumb" and was meant to make programs smaller (at the expense of some speed). Obviously, it's very important that you know what sort of instructions are used in a function before you get there (or else the program will crash hard) so the bottom bit of the function pointer is set to a 1. This means functions coded in ARM can call functions coded in Thumb, and it all just works. – ams Jan 03 '13 at 16:46
The BX instruction (and BLX) uses the bottom bit of the address to encode the target mode. Since thumb is always 2 or 4 byte aligned (thumb/thumb2) and arm is always 4 byte aligned the lsbit is a dont care, so they choose to use that bit to mode switch (for the architectures that have both, not cortex-m). The BX instruction strips the bit and the pc has the lsbit clean, it is just the execution that uses it. Likewise a bl from thumb mode will add that bit to the lr so that the bx lr at the end of the called function works – old_timer Apr 02 '13 at 13:52
Not just BX; in this case it's function pointers that have the bottom bit set. – ams Apr 02 '13 at 16:38

Linux Kernel - why a function's address in System.map is one byte preceding its address as seen in real time?

1 Answers1

Linked