2

i was trying to write a simple debugger in python3 on an 32 bit test linux system (Lubuntu) which should be able to catch all syscalls of an abitrary programm (in this case: /bin/ls). For this i used the ptrace syscall to singlestep through the process. After every step i read the registers to find the instruction pointer eip to read 2 bytes from the next instruction. If those 2 bytes are 0xcd and 0x80 this indicates an int 80 which is the syscall. I know there is also the PTRACE_SYSCALL for this purpose, but i wanted to do this without using it.

In the following i show you the code, and it seems to work, BUT there is some weird behavior:

To figure out if this is working i used strace to compare it's output with my own syscalls. And it seems that my programm shows only the first part of the syscalls, the second part is just missing. To show you i posted the output of my programm and of strace in the following. Does someone have an idea what could be wrong here?

import os               # os interaction
from struct import pack # dealing with bytes (ptrace)
import ctypes           # support c data structures

""" ========================================================== """

# 32 bit reg process structrue
class UserRegsStruct(ctypes.Structure):
    _fields_ = [
        ("ebx", ctypes.c_ulong),
        ("ecx", ctypes.c_ulong),
        ("edx", ctypes.c_ulong),
        ("esi", ctypes.c_ulong),
        ("edi", ctypes.c_ulong),
        ("ebp", ctypes.c_ulong),
        ("eax", ctypes.c_ulong),
        ("xds", ctypes.c_ulong),
        ("xes", ctypes.c_ulong),
        ("xfs", ctypes.c_ulong),
        ("xgs", ctypes.c_ulong),
        ("orig_eax", ctypes.c_ulong),
        ("eip", ctypes.c_ulong),
        ("xcs", ctypes.c_ulong),
        ("eflags", ctypes.c_ulong),
        ("esp", ctypes.c_ulong),
        ("xss", ctypes.c_ulong),
    ]

# ptrace constants
PTRACE_TRACEME = 0
PTRACE_PEEKDATA = 2
PTRACE_SINGLESTEP = 9
PTRACE_GETREGS = 12

CPU_WORD_SIZE = 4   # size of cpu word size (32 bit = 4 bytes)

# for syscalls
libc = ctypes.CDLL('libc.so.6')

# check if child (tracee) is still running
def WIFSTOPPED(status):
    return (status & 0xff) == 0x7f

# read from process memory by PTRACE_PEEKDATA
def ReadProcessMemory(pid, address, size):

    # address must be aligned!!
    offset = address % CPU_WORD_SIZE
    if offset:
        address -= offset
        word = libc.ptrace(PTRACE_PEEKDATA, pid, address, 0)
        wordbytes = pack("i", word)
        subsize = min(CPU_WORD_SIZE - offset, size)
        data = wordbytes[offset:offset + subsize]
        size -= subsize
        address += CPU_WORD_SIZE
    else:
        data = bytes(0)

    while size:
        word = libc.ptrace(PTRACE_PEEKDATA, pid, address, 0)
        wordbytes = pack("i", word)
        if size < CPU_WORD_SIZE:
            data += wordbytes[:size]
            break
        data += wordbytes
        size -= CPU_WORD_SIZE
        address += CPU_WORD_SIZE

    return data

""" ========================================================== """

# extract syscall names
fp = open("/usr/include/i386-linux-gnu/asm/unistd_32.h", "r")
syscalls = [0] * 400

for line in fp:
    if "__NR_" in line:
        a = line.rstrip().split(" ")
        name = a[1].split("NR_")[1]
        number = int(a[2])
        syscalls[number] = name

# "int 80" asm instruction = (0xCD 0x80)
a0 = 0xcd
a1 = 0x80

# create child tracee
pid = os.fork()

if pid == 0:    # in tracee
    libc.ptrace(PTRACE_TRACEME, 0, 0, 0)    # make child traceable
    os.execv("/bin/ls", [":-P"])            # run test programm
else:           # in tracer
    pid, status = os.waitpid(pid, 0)
    regs = UserRegsStruct()

# catch all syscalls
while True:

    libc.ptrace(PTRACE_SINGLESTEP, pid, 0, 0)               # execute next instruction
    pid, status = os.waitpid(pid, 0)                        # wait for tracee
    libc.ptrace(PTRACE_GETREGS, pid, 0, ctypes.byref(regs)) # get register values
    data = ReadProcessMemory(pid, regs.eip, 2)              # read 2 bytes from instruction pointer address

    # now check if this is a syscall
    if data[0] == a0 and data[1] == a1:
        print("HEUREKA! SYSCALL at " + hex(regs.eip) + ": " + syscalls[regs.eax])

    if WIFSTOPPED(status) == False: break # exit loop when tracee stopped

This generated the following output:

HEUREKA! SYSCALL at 0xb7fae2c5: brk
HEUREKA! SYSCALL at 0xb7fa3944: access
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf689: access
HEUREKA! SYSCALL at 0xb7faf4b5: openat
HEUREKA! SYSCALL at 0xb7faf419: fstat64
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf755: close
HEUREKA! SYSCALL at 0xb7faa758: access
HEUREKA! SYSCALL at 0xb7faf4b5: openat
HEUREKA! SYSCALL at 0xb7faf57e: read
HEUREKA! SYSCALL at 0xb7faf419: fstat64
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf755: close
HEUREKA! SYSCALL at 0xb7faa758: access
HEUREKA! SYSCALL at 0xb7faf4b5: openat
HEUREKA! SYSCALL at 0xb7faf57e: read
HEUREKA! SYSCALL at 0xb7faf419: fstat64
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf822: mprotect
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf755: close
HEUREKA! SYSCALL at 0xb7faa758: access
HEUREKA! SYSCALL at 0xb7faf4b5: openat
HEUREKA! SYSCALL at 0xb7faf57e: read
HEUREKA! SYSCALL at 0xb7faf419: fstat64
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf755: close
HEUREKA! SYSCALL at 0xb7faa758: access
HEUREKA! SYSCALL at 0xb7faf4b5: openat
HEUREKA! SYSCALL at 0xb7faf57e: read
HEUREKA! SYSCALL at 0xb7faf419: fstat64
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf755: close
HEUREKA! SYSCALL at 0xb7faa758: access
HEUREKA! SYSCALL at 0xb7faf4b5: openat
HEUREKA! SYSCALL at 0xb7faf57e: read
HEUREKA! SYSCALL at 0xb7faf419: fstat64
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7faf755: close
HEUREKA! SYSCALL at 0xb7faf7ae: mmap2
HEUREKA! SYSCALL at 0xb7f95bd9: set_thread_area
HEUREKA! SYSCALL at 0xb7faf822: mprotect
HEUREKA! SYSCALL at 0xb7faf822: mprotect
HEUREKA! SYSCALL at 0xb7faf822: mprotect
HEUREKA! SYSCALL at 0xb7faf822: mprotect
HEUREKA! SYSCALL at 0xb7faf822: mprotect
HEUREKA! SYSCALL at 0xb7faf822: mprotect
HEUREKA! SYSCALL at 0xb7faf822: mprotect
HEUREKA! SYSCALL at 0xb7faf7ff: munmap
test.py

And here is the output of strace:

execve("/bin/ls", ["/bin/ls"], 0xbfef5e40 /* 45 vars */) = 0
brk(NULL)                               = 0x220c000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f00000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=89915, ...}) = 0
mmap2(NULL, 89915, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7eea000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/i386-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0L\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=169960, ...}) = 0
mmap2(NULL, 179612, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7ebe000
mmap2(0xb7ee7000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0xb7ee7000
mmap2(0xb7ee9000, 3484, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7ee9000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20\220\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1942840, ...}) = 0
mmap2(NULL, 1948188, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7ce2000
mprotect(0xb7eb7000, 4096, PROT_NONE)   = 0
mmap2(0xb7eb8000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d5000) = 0xb7eb8000
mmap2(0xb7ebb000, 10780, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7ebb000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/i386-linux-gnu/libpcre.so.3", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\16\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=480564, ...}) = 0
mmap2(NULL, 483512, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7c6b000
mmap2(0xb7ce0000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x74000) = 0xb7ce0000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/i386-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320\n\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=13796, ...}) = 0
mmap2(NULL, 16500, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7c66000
mmap2(0xb7c69000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0xb7c69000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/i386-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300P\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=142820, ...}) = 0
mmap2(NULL, 123544, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7c47000
mmap2(0xb7c62000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a000) = 0xb7c62000
mmap2(0xb7c64000, 4760, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7c64000
close(3)                                = 0
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7c45000
set_thread_area({entry_number=-1, base_addr=0xb7c45780, limit=0x0fffff, seg_32bit=1, contents=0, read_exec_only=0, limit_in_pages=1, seg_not_present=0, useable=1}) = 0 (entry_number=6)
mprotect(0xb7eb8000, 8192, PROT_READ)   = 0
mprotect(0xb7c62000, 4096, PROT_READ)   = 0
mprotect(0xb7c69000, 4096, PROT_READ)   = 0
mprotect(0xb7ce0000, 4096, PROT_READ)   = 0
mprotect(0xb7ee7000, 4096, PROT_READ)   = 0
mprotect(0x469000, 4096, PROT_READ)     = 0
mprotect(0xb7f2d000, 4096, PROT_READ)   = 0
munmap(0xb7eea000, 89915)               = 0

Until here there is complete compliance with my own output, but the remaining syscalls never appear in my programm. So that's the question. I hope someone knows the answer :P If you have any questions, please ask!

set_tid_address(0xb7c457e8)             = 9767
set_robust_list(0xb7c457f0, 12)         = 0
rt_sigaction(SIGRTMIN, {sa_handler=0xb7c4baf0, sa_mask=[], sa_flags=SA_SIGINFO}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0xb7c4bb80, sa_mask=[], sa_flags=SA_RESTART|SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
ugetrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
uname({sysname="Linux", nodename="p200300D053D7310F22107AFFFE01D58C", ...}) = 0
statfs("/sys/fs/selinux", 0xbffeddb4)   = -1 ENOENT (No such file or directory)
statfs("/selinux", 0xbffeddb4)          = -1 ENOENT (No such file or directory)
brk(NULL)                               = 0x220c000
brk(0x222d000)                          = 0x222d000
brk(0x222e000)                          = 0x222e000
openat(AT_FDCWD, "/proc/filesystems", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "nodev\tsysfs\nnodev\trootfs\nnodev\tr"..., 1024) = 401
read(3, "", 1024)                       = 0
close(3)                                = 0
brk(0x222d000)                          = 0x222d000
access("/etc/selinux/config", F_OK)     = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=3365136, ...}) = 0
mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7a45000
close(3)                                = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=48, ws_col=198, ws_xpixel=0, ws_ypixel=0}) = 0
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_CLOEXEC|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
getdents64(3, /* 3 entries */, 32768)   = 80
getdents64(3, /* 0 entries */, 32768)   = 0
close(3)                                = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
write(1, "test.py\n", 8test.py
)                = 8
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++


guest
  • 51
  • 4

2 Answers2

5

If you're only looking for int 0x80, you're going to miss normal 32-bit syscalls made with the sysenter instruction (normally via glibc calling into the VDSO page). https://blog.packagecloud.io/eng/2016/04/05/the-definitive-guide-to-linux-system-calls/. (Also on old AMD CPUs, 32-bit syscall is also possible, and might be used by default if they're too old to support sysenter.)

I guess the early ld.so code uses the legacy int 0x80 mechanism instead of calling into the VDSO. (Which makes sense; the VDSO presents itself as an ELF shared object mapped into memory; until the dynamic linker sets up function-pointers into it, it can't use it.)

64-bit mode is simpler: everything uses syscall for the 64-bit ABI.


Note that checking machine code before or after an instruction executes could be spoofed by code trying to hide from your tracing. A 2nd thread could cross-modify the machine code bytes after you look at it, before it executes. (Perhaps have one thread store a flag, which will cause another thread to store as soon as it notices. With the right timing, this could sneak in between your ptrace fetch and when you do the next single-step.)

A similar race condition is a problem in real life for PTRACE_SYSCALL used by strace (or a sandbox / syscall logging or filter tool on code that may be trying to trick it) in 64-bit mode trying to figure out whether the 32 or 64-bit ABI was invoked (because the call numbers are different). Can ptrace tell if an x86 system call used the 64-bit or 32-bit ABI? (or was, until Linux kernel 5.3 added PTRACE_GET_SYSCALL_INFO).

It is possible to invoke int 0x80 in 64-bit code, even though it's basically never a good idea: What is the explanation of this x86 Hello World using 32-bit int 0x80 Linux system calls from _start? has some details on what happens on the kernel side of a system call.


Again, this is only a problem if you care about programs trying to obfuscate their activity from your tracer, e.g. as an anti-debugging measure. Having another thread overwrite code that's executing won't happen by accident. But it's something to be aware of when designing debugging / tracing tools.

The real danger comes if this code is used as a library where someone might try to build a sandboxing system-call filter out of it. e.g. check paths in all file-access system calls, or reject open calls that aren't opening read-only. Then evading the tracing becomes a real security problem. (There are much better ways to do sandboxing in general, of course.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
3

Thank you so much!! You were right!

I now updated the code a little bit. I used the /proc/(pid)/auxv file to get the address of the sysenter routine in VDSO by the key AT_SYSINFO. Now i could detect non-legacy syscalls by comparing this address with eip. Actually pretty easy, learned something again ;P

Here is my updated code:

import os                       # os interaction
from struct import pack, unpack # dealing with bytes (ptrace)
import ctypes                   # support c data structures

""" ========================================================== """

# 32 bit reg process structrue
class UserRegsStruct(ctypes.Structure):
    _fields_ = [
        ("ebx", ctypes.c_ulong),
        ("ecx", ctypes.c_ulong),
        ("edx", ctypes.c_ulong),
        ("esi", ctypes.c_ulong),
        ("edi", ctypes.c_ulong),
        ("ebp", ctypes.c_ulong),
        ("eax", ctypes.c_ulong),
        ("xds", ctypes.c_ulong),
        ("xes", ctypes.c_ulong),
        ("xfs", ctypes.c_ulong),
        ("xgs", ctypes.c_ulong),
        ("orig_eax", ctypes.c_ulong),
        ("eip", ctypes.c_ulong),
        ("xcs", ctypes.c_ulong),
        ("eflags", ctypes.c_ulong),
        ("esp", ctypes.c_ulong),
        ("xss", ctypes.c_ulong),
    ]

# ptrace constants
PTRACE_TRACEME = 0
PTRACE_PEEKDATA = 2
PTRACE_SINGLESTEP = 9
PTRACE_GETREGS = 12


AT_SYSINFO = 32     # for getting the syscall entry address by the auxv (/proc/(pid)/auxv

CPU_WORD_SIZE = 4   # size of cpu word size (32 bit = 4 bytes)

# for syscalls
libc = ctypes.CDLL('libc.so.6')

# check if child (tracee) is still running
def WIFSTOPPED(status):
    return (status & 0xff) == 0x7f

# read from process memory by PTRACE_PEEKDATA
def ReadProcessMemory(pid, address, size):

    # address must be aligned!!
    offset = address % CPU_WORD_SIZE
    if offset:
        address -= offset
        word = libc.ptrace(PTRACE_PEEKDATA, pid, address, 0)
        wordbytes = pack("i", word)
        subsize = min(CPU_WORD_SIZE - offset, size)
        data = wordbytes[offset:offset + subsize]
        size -= subsize
        address += CPU_WORD_SIZE
    else:
        data = bytes(0)

    while size:
        word = libc.ptrace(PTRACE_PEEKDATA, pid, address, 0)
        wordbytes = pack("i", word)
        if size < CPU_WORD_SIZE:
            data += wordbytes[:size]
            break
        data += wordbytes
        size -= CPU_WORD_SIZE
        address += CPU_WORD_SIZE

    return data

def GetSyscallEntry(pid):
    # find the syscall entry in vdso
    # read the auxv of the child process
    fd = open("/proc/" + str(pid) + "/auxv", "rb")
    while True:
        k = fd.read(4)
        v = fd.read(4)
        if not k or not v: break
        k = unpack('i', k)[0]
        v = unpack('i', v)[0]
        #print(str(k) + ":" + str(v))
        if k == AT_SYSINFO:
            sc_entry = ctypes.c_ulong(v).value
            #print("found syscall entry: " + hex(sc_entry))
            return sc_entry

""" ========================================================== """

# extract syscall names
fp = open("/usr/include/i386-linux-gnu/asm/unistd_32.h", "r")
syscalls = [0] * 400

for line in fp:
    if "__NR_" in line:
        a = line.rstrip().split(" ")
        name = a[1].split("NR_")[1]
        number = int(a[2])
        syscalls[number] = name

# "int 80" asm instruction = (0xCD 0x80)
a0 = 0xcd
a1 = 0x80

# create child tracee
pid = os.fork()

if pid == 0:    # in tracee
    libc.ptrace(PTRACE_TRACEME, 0, 0, 0)    # make child traceable
    os.execv("/bin/ls", [":-P"])            # run test programm
else:           # in tracer
    pid, status = os.waitpid(pid, 0)
    regs = UserRegsStruct()
    sc_entry = GetSyscallEntry(pid)         # get the syscall entry address in child vdso space
    print("child pid: " + str(pid))

# catch all syscalls
while True:

    libc.ptrace(PTRACE_SINGLESTEP, pid, 0, 0)               # execute next instruction
    pid, status = os.waitpid(pid, 0)                        # wait for tracee
    libc.ptrace(PTRACE_GETREGS, pid, 0, ctypes.byref(regs)) # get register values
    data = ReadProcessMemory(pid, regs.eip, 2)              # read 2 bytes from instruction pointer address

    # now check if this is a syscall
    if data[0] == a0 and data[1] == a1:
        print("HEUREKA! SYSCALL (legacy) at " + hex(regs.eip) + ": " + syscalls[regs.eax])

    if regs.eip == sc_entry:
        print("HEUREKA! SYSCALL (sysenter) at " + hex(regs.eip) + ": " + syscalls[regs.eax])

    if WIFSTOPPED(status) == False: break # exit loop when tracee stopped
guest
  • 51
  • 4
  • In theory a binary could *manually* use `sysenter`, instead of calling into the VDSO. If you don't care about detecting that, then don't worry about it. – Peter Cordes Feb 27 '20 at 21:32
  • So i assume i could take this into consideration by just checking the instruction code again as already with the int 80. The code for the sysenter syscall is 0F 34. Or do you know another way? – guest Feb 27 '20 at 22:00
  • I just added this feature, but now i capture the non-legacy syscalls twice. Analysing the eip addresses showed me, that after jumping into vdso-space, sysenter is called. So out of my sight the direct detection of the sysenter is the better choice. – guest Feb 27 '20 at 22:21
  • Yeah, if you already have code to check the machine code you can use that to check for int 0x80, sysenter, or syscall. (AMD CPUs support syscall in 32-bit mode; the VDSO will use it under 64-bit kernels if it's available but sysenter isn't.) You can normally assume no prefixes, although `rep int 0x80` for example will still execute as `int 0x80`. – Peter Cordes Feb 28 '20 at 01:17
  • But if a prog wants to hide from your tracer, you're screwed anyway without `PTRACE_SYSCALL`. A program could cross-modify its machine code from another thread so an `int 0x80` could execute in one thread but a store from another thread could modify those bytes before your ptrace had a chance to read it. (Very plausible if another thread is spin-waiting on a flag stored right before executing the system call). It's a problem for `strace`, too, distinguishing 32 vs 64-bit ABI: see [Can ptrace tell if an x86 system call used the 64-bit or 32-bit ABI?](//stackoverflow.com/q/53456266) – Peter Cordes Feb 28 '20 at 01:21
  • Err wait, I have that backwards. You're single-stepping, so you're decoding *before* the instruction executes, unlike the `PTRACE_SYSCALL` case. So there's still a possible race with cross-modifying code, but it would have to change the machine code *from* something innocent (like a 2-byte NOP) into `int 0x80`. – Peter Cordes Feb 28 '20 at 02:51
  • ok, so but i think in 99,999 % of all traced programs that would not be the case, or have you ever seen something like this? Is it possible to write into the executable address space of the process at all? i think it's just read and executable? – guest Feb 28 '20 at 10:01
  • This won't happen by accident. But an obfuscated program's anti-debugging scheme, or malware or an exploit, might intentionally be trying to fool your tracer. Linux does allow a page to be writeable + executable. You can even link an ELF binary with `ld --omagic` so the `.text` section = read+write+exec. And you can always `mprotect` or JIT into an mmapped buffer. (OpenBSD enforces W^X, but Linux doesn't.) If you're using this tool in any way that matters for security reasons, it's that 0.001% that intentionally evades you that's a problem! – Peter Cordes Feb 28 '20 at 13:04