1

I am reading documentation for a uprobe tracer and there is a instruction how to compute offset of a function in memory. I am quoting it here.

Following example shows how to dump the instruction pointer and %ax register at the probed text address. Probe zfree function in /bin/zsh:

# cd /sys/kernel/debug/tracing/
# cat /proc/`pgrep zsh`/maps | grep /bin/zsh | grep r-xp
00400000-0048a000 r-xp 00000000 08:03 130904 /bin/zsh
# objdump -T /bin/zsh | grep -w zfree
0000000000446420 g    DF .text  0000000000000012  Base        zfree

0x46420 is the offset of zfree in object /bin/zsh that is loaded at 0x00400000.

I do not know why, but they took output 0x446420 and subtracted 0x400000 to get 0x46420. It seamed as an error to me. Why 0x400000?

I have tried to do the same on my Fedora 23 with 4.5.6-200 kernel.

First I turned off memory address randomization

echo 0 > /proc/sys/kernel/randomize_va_space

Then I figured out where binary is in memory

$ cat /proc/`pgrep zsh`/maps | grep /bin/zsh | grep r-xp
555555554000-55555560f000 r-xp 00000000 fd:00 2387155                    /usr/bin/zsh

Took the offset

marko@fedora:~ $ objdump -T /bin/zsh | grep -w zfree
000000000005dc90 g    DF .text  0000000000000012  Base        zfree

And figured out where zfree is via gdb

$ gdb -p 21067 --batch -ex 'p zfree'
$1 = {<text variable, no debug info>} 0x5555555b1c90 <zfree>

marko@fedora:~ $ python
Python 2.7.11 (default, Mar 31 2016, 20:46:51) 
[GCC 5.3.1 20151207 (Red Hat 5.3.1-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> hex(0x5555555b1c90-0x555555554000)
'0x5dc90'

You see, I've got the same result as in objdump without subtracting anything.

But then I tried the same on another machine with SLES and there it's the same as in uprobe documentation.

Why is there such a difference? How do I compute correct offset then?

Marko Kevac
  • 2,902
  • 30
  • 47
  • The doc says it: in object /bin/zsh that is loaded at 0x00400000. Offset is probably supposed to be relative to program load address, so you need to substract program load address from function's absolute address. – dbrank0 Jun 17 '16 at 07:05
  • 1
    As to why the base address on many systems is 0x400000 - see [Why do virtual memory addresses for linux binaries start at 0x8048000](http://stackoverflow.com/questions/2966426). Fedora 23 builds its packages with `-fPIC -shared`, so the load offset is 0 and the base address is either random (with ASLR) or 0x555555554000. – Mark Plotnick Jun 21 '16 at 00:20

1 Answers1

4

As far as I see the difference may be caused only by the way how examined binary was built. Saying more precisely - if ELF has fixed load address or not. Lets do simple experiment. We have simple test code:

int main(void) { return 0; }

Then, build it in two ways:

$ gcc -o t1 t.c      # create image with fixed load address
$ gcc -o t2 t.c -pie # create load-base independent image

Now, lets check load base addresses for these two images:

$ readelf -l --wide t1 | grep LOAD
  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x00067c 0x00067c R E 0x200000
  LOAD           0x000680 0x0000000000600680 0x0000000000600680 0x000228 0x000230 RW  0x200000
$ readelf -l --wide t2 | grep LOAD
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x0008cc 0x0008cc R E 0x200000
  LOAD           0x0008d0 0x00000000002008d0 0x00000000002008d0 0x000250 0x000258 RW  0x2000

Here you can see that first image requires fixed load address - 0x400000, and the second one has no address requirements at all.

And now we can compare addresses that objdump tells about main:

$ objdump -t t1 | grep ' main'
00000000004004b6 g     F .text  000000000000000b              main
$ objdump -t t2 | grep ' main'
0000000000000710 g     F .text  000000000000000b              main

As we see, the address is a complete virtual address that first byte of main will occupy if image is loaded at address, stored in program header. And of course the second image never won't be loaded at 0x0 but instead at another, randomly chosen location, that will offset real function position.

Sergio
  • 8,099
  • 2
  • 26
  • 52
  • @MarkoKevac It is useful for security purposes. Using of an arbitrary load address makes it harder to exploit miscellaneous vulnerabilities. Because locations of global objects and code are changed from time to time and an attacker can't relay on constant addresses anymore. E.g. for this reason Android forbids using of non-pie executables since version 5.0 (Lollipop) – Sergio Jun 22 '16 at 11:01
  • Like what? AFAIK you can't write to .text in running process, so it is not the case that fixed address of a function can help you. No? – Marko Kevac Jun 22 '16 at 14:45
  • 1
    Consider well known return address replacement via buffer overrun. E.g. we want skip authorization and jump right to priviledged code. If binary has fixed load base - we can use the same malicious input each time. If no - we need to correct input again and again to land to desired code after return. It is very simple explanation, but nonetheless it gives an idea about how it works. – Sergio Jun 22 '16 at 17:38