0

Can you please explain me what is program break point and what sys_brk call returns in Linux on x86-64 architecture. I used to think that break point of the program is the last valid address in the program's address space. If I want to extend it, I have to call brk syscall and provide it with new address of break point. And in case of success I'll get back the new break point, that can be greater than was requested, because linux operates by pages. But I'm confused by the behavior of some simple code.

.equ sys_call_exit,60 #rdi - exit code
.equ sys_call_brk, 12 #rdi - long break
.section .data
msg_brk:
    .ascii "Your break is here:%d\n\0"    
msg_addr:
    .ascii "Addr: %d\n\0"    
.equ ST_BRK1,8
.equ ST_BRK2,16
.equ ST_BRK3,24
.equ OFFSET, 65528
.globl _start
.section .text 
_start:
    movq %rsp,%rbp
    subq $32,%rsp

    ################Get current brk
    movq $sys_call_brk,%rax
    movq $0,%rdi
    syscall

    movq %rax,ST_BRK1(%rbp)
    ###############################


    ################Try to extend
    movq ST_BRK1(%rbp),%rdi
    addq $8,%rdi
    movq $sys_call_brk,%rax
    syscall

    movq %rax,ST_BRK2(%rbp)
    ##############################


    ################Get brk after extenstion
    movq $sys_call_brk,%rax
    movq $0,%rdi
    syscall

    movq %rax,ST_BRK3(%rbp)
    ###############################


    #####print results#############    

    movq ST_BRK1(%rbp),%rsi
    movq $msg_brk,%rdi
    call printf

    movq ST_BRK2(%rbp),%rsi
    movq $msg_brk,%rdi
    call printf

    movq ST_BRK3(%rbp),%rsi
    movq $msg_brk,%rdi
    call printf

    #########################3

    # Get the last brk point in rbx
    movq ST_BRK3(%rbp),%rbx

    loop:
        #Print rbx
        movq %rbx,%rsi
        movq $msg_addr,%rdi
        call printf

        #movq ST_BRK3(%rbp),%rbx
        movb (%rbx),%cl
        incq %rbx
    jmp loop 

    movq $0,%rdi
    call exit

First, I'm getting current brk, then I try to extend it by 8 bytes. And then I get current brk again. After that I try to access the memory above the new brk. The results are the following:

Your break is here:22466560
Your break is here:22466568
Your break is here:22466568
......
Addr: 22470380
Addr: 22470381
Addr: 22470382
Addr: 22470383
Addr: 22470384
Segmentation fault (core dumped)

As you can see, first I got 22466560 as a current brk; Then brk was moved to 22466568 exactly as it was requested; But linux usually operates by pages, so I called brk(0) one more time to check the real current brk and I got 22466568 again. After that I tried to access the memory the is above 22466568. And I got segmentation fault only on 22470385. So I could access 3817 byte (22470385 - 22466568) above the brk.

My question is why I can access memory that is above brk and how I can get real brk point? Thanks.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    As you said, it's page size. The values returned don't reflect that though. "Current brk" returns what you last asked for, not what you actually got due to operating system internals. – Jester Mar 21 '18 at 18:27
  • What operating system are you programming on? – fuz Mar 21 '18 at 18:36
  • CentOS Linux release 7.4.1708 – Andrew Bolotov Mar 21 '18 at 18:43
  • Use hex for pointer values; it makes it easy to see the page alignment. `22466560` is `0x156d000`. The last address *printed* was `0x156def0`, which is still within the same page, so I think your output is truncated. Did you pipe it into something or redirect to a file? That makes `stdout` full-buffered, and the buffer isn't flushed with an uncaught SIGSEGV. (https://stackoverflow.com/questions/38379553/using-printf-in-assembly-leads-to-an-empty-ouput) – Peter Cordes Mar 22 '18 at 02:09
  • I tried it on my system (with output to a terminal), and piping through `cat` I get output that ends (in the middle of a line) around `0x192ef01` (because I changed the format to `.asciiz "Addr: %#x\n"`), but anyway about as close to the end of a page as you. **With output to a terminal (line buffered stdout), the last address printed is the one that faults, and is the first byte of the next page, e.g. `0x1cd5000`**. You could have used `stderr` here for unbuffered output, or simply used a debugger to see which instruction faults without printing, simplifying your loop. – Peter Cordes Mar 22 '18 at 02:10
  • And BTW, you can `#include ` and use `__NR_brk` instead of defining your own `.equ` constants. – Peter Cordes Mar 22 '18 at 02:10
  • 1
    TL:DR: you can access the entire 4k page containing the current break. See the bottom of [the man page](http://man7.org/linux/man-pages/man2/brk.2.html) for details on what the Linux system call returns. Apparently the kernel keeps track of the break with byte precision, though, and doesn't round it up / down to a 4k boundary except for deciding which pages to map. – Peter Cordes Mar 22 '18 at 02:16
  • Thanks for explanation and comments. They are very helpful. I checked out the man page, and it says: "However, the actual Linux system call returns the new program break on success. On failure, the system call returns the current break" So, it means that linux system call returns new break, but I can read above that break. And it actually confuses me. – Andrew Bolotov Mar 22 '18 at 19:19

0 Answers0