51

I have implemented a JNA bridge to FDK-AAC. Source code can be found in here

When bench-marking my code, I can get hundreds of successful runs on the same input, and then occasionally a C-level crash that'll kill the entire process, causing a core-dump to be generated:

Looking at the core dump, it looks like this:

#1  0x00007f3e92e00f5d in __GI_abort () at abort.c:90
#2  0x00007f3e92e4928d in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f3e92f70528 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007f3e92e5064a in malloc_printerr (action=<optimized out>, str=0x7f3e92f6cdee "corrupted size vs. prev_size", ptr=<optimized out>, ar_ptr=<optimized out>) at malloc.c:5426
#4  0x00007f3e92e5304a in _int_free (av=0x7f3de0000020, p=<optimized out>, have_lock=0) at malloc.c:4337
#5  0x00007f3e92e5744e in __GI___libc_free (mem=<optimized out>) at malloc.c:3145
#6  0x00007f3e113921e9 in FDKfree (ptr=0x7f3de009df60) at libSYS/src/genericStds.cpp:233
#7  0x00007f3e1130d7d3 in Free_AacEncoder (p=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:407
#8  0x00007f3e1130fbb3 in aacEncClose (phAacEncoder=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:1395

This back/stack trace error is reproducible if I run repeat benchmark enough times , though I'm having a hard time understanding what might be the cause for such error? Memory allocated to pointer 0x7f3de009df60 is allocated inside the CPP/C code as well and I can guarantee the same instance that's allocated is being freed. The benchmark is, of course - single-threaded.

After reading these:

security checks && internal functions

I'm still having a hard time understanding - what might be a real (non-exploitation, but rather error)) scenario that causes me to get the above error? and why does it happen very scarcely?

Current suspicion:

Running a detailed backtrace, I get this input:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {4, 6378670679680, 645636045657660056, 90523359816, 139904561311072, 292199584, 139903730612120, 139903730611784, 139904561311088, 1460617926600, 47573685816, 4119199860131166208, 
            139904593745464, 139904553224483, 139904561311136, 288245657}}
        pid = <optimized out>
        tid = <optimized out>
#1  0x00007f3e92e00f5d in __GI_abort () at abort.c:90
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7f3de026db10, sa_sigaction = 0x7f3de026db10}, sa_mask = {__val = {139903730540556, 19, 30064771092, 812522497172832284, 139903728706672, 1887866374039011357, 
              139900298780168, 3775732748407067896, 763430436865, 35180077121538, 4119199860131166208, 139904561311552, 139904553065676, 1, 139904561311584, 139904561312192}}, sa_flags = 4096, 
          sa_restorer = 0x14}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007f3e92e4928d in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f3e92f70528 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:181
        ap = {{gp_offset = 40, fp_offset = 32574, overflow_arg_area = 0x7f3e11adf1d0, reg_save_area = 0x7f3e11adf160}}
        fd = <optimized out>
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
        written = <optimized out>
#3  0x00007f3e92e5064a in malloc_printerr (action=<optimized out>, str=0x7f3e92f6cdee "corrupted size vs. prev_size", ptr=<optimized out>, ar_ptr=<optimized out>) at malloc.c:5426
        buf = "00007f3de009e9f0"
        cp = <optimized out>
        ar_ptr = <optimized out>
        ptr = <optimized out>
        str = 0x7f3e92f6cdee "corrupted size vs. prev_size"
        action = <optimized out>
#4  0x00007f3e92e5304a in _int_free (av=0x7f3de0000020, p=<optimized out>, have_lock=0) at malloc.c:4337
        size = 2720
        fb = <optimized out>
        nextchunk = 0x7f3de009e9f0
        nextsize = 736
        nextinuse = <optimized out>
        prevsize = <optimized out>
        bck = <optimized out>
        fwd = <optimized out>
        errstr = 0x0
        locked = <optimized out>
#5  0x00007f3e92e5744e in __GI___libc_free (mem=<optimized out>) at malloc.c:3145
        ar_ptr = <optimized out>
        p = <optimized out>
        hook = <optimized out>
#6  0x00007f3e113921e9 in FDKfree (ptr=0x7f3de009df60) at libSYS/src/genericStds.cpp:233
No locals.
#7  0x00007f3e1130d7d3 in Free_AacEncoder (p=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:407
No locals.
#8  0x00007f3e1130fbb3 in aacEncClose (phAacEncoder=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:1395
        hAacEncoder = 0x7f3de009df60
        err = AACENC_OK
  • In frame #6, you can see the pointer in questions is 0x7f3de009df60.
  • In frame #4, you can see that the size is 2720, which is indeed the expected size of the structure being released.
  • However the address of nextchunk is 0x7f3de009e9f0, which is only 2704 bytes after the current pointer which is being released.
  • I can confirm this is always the case when the error reproduces.
  • Could this be a strong indication of the error I'm facing ??
Sheinbergon
  • 2,875
  • 1
  • 15
  • 26
  • I recommend taking a few steps back, and constructing a [MCVE] to find the memory management bug in your code. While it's not impossible that analysing addresses will reveal the problem, such low-level antics ought to be a last resort, particularly given the likelihood that your program has UB (and that, therefore, these addresses cannot even be trusted). Either way, without such a MCVE, we won't be debugging here.... – Lightness Races in Orbit Apr 03 '18 at 11:14
  • 1
    Use valgrind or Address Sanitizer. – John Zwinck Apr 03 '18 at 11:23
  • 2
    @LightnessRacesinOrbit thank you for your detailed response. As generating a MCVE would be quite hard (again, this error is not consistently reproducible), maybe we should start with a simpler question - in regards to a practical understanding of the error "corrupted size vs. prev_size" - Do you have any idea as to what might trigger this specific error in a program? – Sheinbergon Apr 03 '18 at 11:28
  • Yes, generating MCVEs is hard, but nothing worth doing is ever easy. That's the job that you have to do. Debugging is the first step. I fully realise that it's tempting to try to skip this step by accruing more general guidelines, but that's simply not practical until you have homed in on the problem. Good luck! – Lightness Races in Orbit Apr 03 '18 at 11:28

1 Answers1

66

OK, so I've managed to overcome this issue.

First of all - A practical cause to "corrupted size vs. prev_size" is quite simple - memory chunk control structure fields in the adjacent following chunk are being overwritten due to out-of-bounds access by the code. if you allocate x bytes for pointer p but wind up writing beyond x in regards to the same pointer, you might get this error, indicating the current memory allocation (chunk) size is not the same as what's found in the next chunk control structure (due to it being overwritten).

As for the cause for this memory leak - structure mapping done in the Java/JNA layer implied different #pragma related padding/alignment from what dll/so was compiled with. This in turn, caused data to be written beyond the allocated structure boundary. Disabling that alignment made the issues go away. (Thousands of executions without a single crash!).

Sheinbergon
  • 2,875
  • 1
  • 15
  • 26
  • 2
    This helped me to solve an age old problem! Thank you :) I had used #pragma pack(push, 1) in declaring a structure and had missed #pragma pack(push) in the end of the structure. – Jainam MJ Mar 25 '19 at 14:29
  • 5
    @JainamMJ : Wouldn't be; `#pragma pack(pop)` at the end instead? – lepe Apr 10 '19 at 08:26
  • About `#pragma`: https://stackoverflow.com/questions/33437269/what-are-the-differences-between-pragma-packpush-n-pragma-packpop-and-a – lepe Apr 10 '19 at 08:27
  • 1
    Yes, #pragma pack(pop). Sorry for the typo. – Jainam MJ Apr 12 '19 at 05:46