15

I am working on code that includes bllipparser Python module, among other things. Feeding it the same dataset, it will intermittently crash (maybe once in three to ten runs). Going through lldb, I found that the public field weights of RerankerModel (source), that is apparently only set once (in the constructor), randomly becomes NULL (I only have one RerankerModel for the duration of my run, so there should be exactly one weights, that persists unchanged throughout). So I set up an ambush (I mean, a watchpoint: I stopped the code in the constructor and watchpoint set expression -w write -- &weights), and apparently the culprit that nulls the pointer is tiny_malloc_from_free_list from libsystem_malloc.dylib. Here's the relevant top of the backtrace:

* thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 4
  * frame #0: 0x00007fff61caf22a libsystem_malloc.dylib`tiny_malloc_from_free_list + 151
    frame #1: 0x00007fff61cae3bf libsystem_malloc.dylib`szone_malloc_should_clear + 422
    frame #2: 0x00007fff61cae1bd libsystem_malloc.dylib`malloc_zone_malloc + 103
    frame #3: 0x00007fff61cad4c7 libsystem_malloc.dylib`malloc + 24
    frame #4: 0x00007fff5faac628 libc++abi.dylib`operator new(unsigned long) + 40
    frame #5: 0x00000001133c904c _CharniakParser.cpython-36m-darwin.so`std::__1::__split_buffer<short, std::__1::allocator<short>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<short>&) [inlined] std::__1::__allocate(__size=4) at new:226
    frame #6: 0x00000001133c9040 _CharniakParser.cpython-36m-darwin.so`std::__1::__split_buffer<short, std::__1::allocator<short>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<short>&) [inlined] std::__1::allocator<short>::allocate(this=0x0000000135316448, __n=2, (null)=0x0000000000000000) at memory:1747
    frame #7: 0x00000001133c8f44 _CharniakParser.cpython-36m-darwin.so`std::__1::__split_buffer<short, std::__1::allocator<short>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<short>&) [inlined] std::__1::allocator_traits<std::__1::allocator<short> >::allocate(__a=0x0000000135316448, __n=2) at memory:1502
    frame #8: 0x00000001133c8f16 _CharniakParser.cpython-36m-darwin.so`std::__1::__split_buffer<short, std::__1::allocator<short>&>::__split_buffer(this=0x00007ffeefbf3b48, __cap=2, __start=1, __a=0x0000000135316448) at __split_buffer:311
    frame #9: 0x00000001133c878d _CharniakParser.cpython-36m-darwin.so`std::__1::__split_buffer<short, std::__1::allocator<short>&>::__split_buffer(this=0x00007ffeefbf3b48, __cap=2, __start=1, __a=0x0000000135316448) at __split_buffer:310
    frame #10: 0x00000001133c869b _CharniakParser.cpython-36m-darwin.so`void std::__1::vector<short, std::__1::allocator<short> >::__push_back_slow_path<short const>(this=0x0000000135316438 size=1, __x=0x00007ffeefbf3caa) at vector:1567
    frame #11: 0x00000001133c4446 _CharniakParser.cpython-36m-darwin.so`Val::extendTrees(Bst&, int) [inlined] std::__1::vector<short, std::__1::allocator<short> >::push_back(this=0x0000000135316438 size=1, __x=0x00007ffeefbf3caa) at vector:1588

I am very much not an expert on C++, but... How is the allocator NULLing the pointer? How does the allocator even know where the pointer is? Why is the allocator NULLing the pointer? I mean, I can see how I might run out of memory, what I was doing wasn't exactly memory-light, but I'd much sooner expect allocator to fail to allocate than to randomly deallocate something - and I had no idea it could NULL a pointer. Can anyone explain to me exactly how this works, why it happened, how it happened, why is it always the same pointer in a code that has lots and lots of other juicy pointers, and what I can do to make it not happen?

Addendum if needed: Here's where the actual NULLing takes place, the code of tiny_malloc_from_free_list, if anyone can make sense of it...

libsystem_malloc.dylib`tiny_malloc_from_free_list:
    0x7fff61caf193 <+0>:    pushq  %rbp
    0x7fff61caf194 <+1>:    movq   %rsp, %rbp
    0x7fff61caf197 <+4>:    pushq  %r15
    0x7fff61caf199 <+6>:    pushq  %r14
    0x7fff61caf19b <+8>:    pushq  %r13
    0x7fff61caf19d <+10>:   pushq  %r12
    0x7fff61caf19f <+12>:   pushq  %rbx
    0x7fff61caf1a0 <+13>:   pushq  %rax
    0x7fff61caf1a1 <+14>:   movl   %edx, %r15d
    0x7fff61caf1a4 <+17>:   movq   %rsi, %r14
    0x7fff61caf1a7 <+20>:   movq   %rdi, %r12
    0x7fff61caf1aa <+23>:   leal   -0x1(%r15), %ecx
    0x7fff61caf1ae <+27>:   movq   0x18(%r14,%rcx,8), %r13
    0x7fff61caf1b3 <+32>:   testq  %r13, %r13
    0x7fff61caf1b6 <+35>:   je     0x7fff61caf22f            ; <+156>
    0x7fff61caf1b8 <+37>:   movq   0x8(%r13), %rdx
    0x7fff61caf1bc <+41>:   movq   %rdx, %rax
    0x7fff61caf1bf <+44>:   shlq   $0x4, %rax
    0x7fff61caf1c3 <+48>:   shrq   $0x3c, %rdx
    0x7fff61caf1c7 <+52>:   movq   0x278(%r12), %rsi
    0x7fff61caf1cf <+60>:   xorq   %rax, %rsi
    0x7fff61caf1d2 <+63>:   movq   %rsi, %rdi
    0x7fff61caf1d5 <+66>:   shrq   $0x8, %rdi
    0x7fff61caf1d9 <+70>:   addl   %esi, %edi
    0x7fff61caf1db <+72>:   movq   %rsi, %rbx
    0x7fff61caf1de <+75>:   shrq   $0x10, %rbx
    0x7fff61caf1e2 <+79>:   addl   %edi, %ebx
    0x7fff61caf1e4 <+81>:   movq   %rsi, %rdi
    0x7fff61caf1e7 <+84>:   shrq   $0x18, %rdi
    0x7fff61caf1eb <+88>:   addl   %ebx, %edi
    0x7fff61caf1ed <+90>:   movq   %rsi, %rbx
    0x7fff61caf1f0 <+93>:   shrq   $0x20, %rbx
    0x7fff61caf1f4 <+97>:   addl   %edi, %ebx
    0x7fff61caf1f6 <+99>:   movq   %rsi, %rdi
    0x7fff61caf1f9 <+102>:  shrq   $0x28, %rdi
    0x7fff61caf1fd <+106>:  addl   %ebx, %edi
    0x7fff61caf1ff <+108>:  movq   %rsi, %rbx
    0x7fff61caf202 <+111>:  shrq   $0x30, %rbx
    0x7fff61caf206 <+115>:  addl   %edi, %ebx
    0x7fff61caf208 <+117>:  shrq   $0x38, %rsi
    0x7fff61caf20c <+121>:  addl   %ebx, %esi
    0x7fff61caf20e <+123>:  andl   $0xf, %esi
    0x7fff61caf211 <+126>:  cmpq   %rsi, %rdx
    0x7fff61caf214 <+129>:  jne    0x7fff61caf602            ; <+1135>
    0x7fff61caf21a <+135>:  testq  %rax, %rax
    0x7fff61caf21d <+138>:  je     0x7fff61caf2de            ; <+331>
    0x7fff61caf223 <+144>:  movq   (%r13), %rdx
    0x7fff61caf227 <+148>:  movq   %rdx, (%rax)
->  0x7fff61caf22a <+151>:  jmp    0x7fff61caf2f2            ; <+351>
    0x7fff61caf22f <+156>:  movq   $-0x1, %rax
    0x7fff61caf236 <+163>:  shlq   %cl, %rax
    0x7fff61caf239 <+166>:  andq   0x818(%r14), %rax
    0x7fff61caf240 <+173>:  je     0x7fff61caf4f9            ; <+870>
    0x7fff61caf246 <+179>:  bsfq   %rax, %rcx
    0x7fff61caf24a <+183>:  cmpq   $0x3f, %rcx
    0x7fff61caf24e <+187>:  je     0x7fff61caf39d            ; <+522>
    0x7fff61caf254 <+193>:  movq   0x18(%r14,%rcx,8), %r13
    0x7fff61caf259 <+198>:  testq  %r13, %r13
    0x7fff61caf25c <+201>:  je     0x7fff61caf39d            ; <+522>
    0x7fff61caf262 <+207>:  movq   0x8(%r13), %rdx
    0x7fff61caf266 <+211>:  movq   %rdx, %rax
    0x7fff61caf269 <+214>:  shlq   $0x4, %rax
    0x7fff61caf26d <+218>:  shrq   $0x3c, %rdx
    0x7fff61caf271 <+222>:  movq   0x278(%r12), %rsi
    0x7fff61caf279 <+230>:  xorq   %rax, %rsi
    0x7fff61caf27c <+233>:  movq   %rsi, %rdi
    0x7fff61caf27f <+236>:  shrq   $0x8, %rdi
    0x7fff61caf283 <+240>:  addl   %esi, %edi
    0x7fff61caf285 <+242>:  movq   %rsi, %rbx
    0x7fff61caf288 <+245>:  shrq   $0x10, %rbx
    0x7fff61caf28c <+249>:  addl   %edi, %ebx
    0x7fff61caf28e <+251>:  movq   %rsi, %rdi
    0x7fff61caf291 <+254>:  shrq   $0x18, %rdi
    0x7fff61caf295 <+258>:  addl   %ebx, %edi
    0x7fff61caf297 <+260>:  movq   %rsi, %rbx
    0x7fff61caf29a <+263>:  shrq   $0x20, %rbx
    0x7fff61caf29e <+267>:  addl   %edi, %ebx
    0x7fff61caf2a0 <+269>:  movq   %rsi, %rdi
    0x7fff61caf2a3 <+272>:  shrq   $0x28, %rdi
    0x7fff61caf2a7 <+276>:  addl   %ebx, %edi
    0x7fff61caf2a9 <+278>:  movq   %rsi, %rbx
    0x7fff61caf2ac <+281>:  shrq   $0x30, %rbx
    0x7fff61caf2b0 <+285>:  addl   %edi, %ebx
    0x7fff61caf2b2 <+287>:  shrq   $0x38, %rsi
    0x7fff61caf2b6 <+291>:  addl   %ebx, %esi
    0x7fff61caf2b8 <+293>:  andl   $0xf, %esi
    0x7fff61caf2bb <+296>:  cmpq   %rsi, %rdx
    0x7fff61caf2be <+299>:  jne    0x7fff61caf602            ; <+1135>
    0x7fff61caf2c4 <+305>:  movq   %rax, 0x18(%r14,%rcx,8)
    0x7fff61caf2c9 <+310>:  testq  %rax, %rax
    0x7fff61caf2cc <+313>:  je     0x7fff61caf59f            ; <+1036>
    0x7fff61caf2d2 <+319>:  movq   (%r13), %rcx
    0x7fff61caf2d6 <+323>:  movq   %rcx, (%rax)
    0x7fff61caf2d9 <+326>:  jmp    0x7fff61caf5b2            ; <+1055>
    0x7fff61caf2de <+331>:  movl   $0xfffffffe, %edx         ; imm = 0xFFFFFFFE 
    0x7fff61caf2e3 <+336>:  roll   %cl, %edx
    0x7fff61caf2e5 <+338>:  movl   %ecx, %esi
    0x7fff61caf2e7 <+340>:  shrl   $0x5, %esi
    0x7fff61caf2ea <+343>:  andl   %edx, 0x818(%r14,%rsi,4)
    0x7fff61caf2f2 <+351>:  movq   %rax, 0x18(%r14,%rcx,8)
    0x7fff61caf2f7 <+356>:  incl   0x850(%r14)
    0x7fff61caf2fe <+363>:  movzwl %r15w, %esi
    0x7fff61caf302 <+367>:  movl   %esi, %ecx
    0x7fff61caf304 <+369>:  shll   $0x4, %ecx
    0x7fff61caf307 <+372>:  addq   %rcx, 0x858(%r14)
    0x7fff61caf30e <+379>:  movq   %r13, %rax
    0x7fff61caf311 <+382>:  andq   $-0x100000, %rax          ; imm = 0xFFF00000 
    0x7fff61caf317 <+388>:  addl   0xfc098(%rax), %ecx
    0x7fff61caf31d <+394>:  movl   %ecx, 0xfc098(%rax)
    0x7fff61caf323 <+400>:  cmpl   $0xbd060, %ecx            ; imm = 0xBD060 
    0x7fff61caf329 <+406>:  jb     0x7fff61caf335            ; <+418>
    0x7fff61caf32b <+408>:  movl   $0x0, 0xfc090(%rax)
    0x7fff61caf335 <+418>:  cmpl   $0x2, %esi
    0x7fff61caf338 <+421>:  jb     0x7fff61caf344            ; <+433>
    0x7fff61caf33a <+423>:  movq   %r13, %rdi
    0x7fff61caf33d <+426>:  callq  0x7fff61cc5eaa            ; set_tiny_meta_header_in_use
    0x7fff61caf342 <+431>:  jmp    0x7fff61caf38b            ; <+504>
    0x7fff61caf344 <+433>:  movq   %r13, %rcx
    0x7fff61caf347 <+436>:  shrq   $0x4, %rcx
    0x7fff61caf34b <+440>:  movl   %r13d, %edx
    0x7fff61caf34e <+443>:  shrl   $0x8, %edx
    0x7fff61caf351 <+446>:  andl   $0xffe, %edx              ; imm = 0xFFE 
    0x7fff61caf357 <+452>:  movl   $0x1, %esi
    0x7fff61caf35c <+457>:  movl   $0x1, %edi
    0x7fff61caf361 <+462>:  shll   %cl, %edi
    0x7fff61caf363 <+464>:  orl    %edi, 0xfc0a0(%rax,%rdx,4)
    0x7fff61caf36a <+471>:  orl    $0x1, %edx
    0x7fff61caf36d <+474>:  orl    %edi, 0xfc0a0(%rax,%rdx,4)
    0x7fff61caf374 <+481>:  leal   0x1(%rcx), %ecx
    0x7fff61caf377 <+484>:  movl   %ecx, %edx
    0x7fff61caf379 <+486>:  shrl   $0x4, %edx
    0x7fff61caf37c <+489>:  andl   $0xffe, %edx              ; imm = 0xFFE 
    0x7fff61caf382 <+495>:  shll   %cl, %esi
    0x7fff61caf384 <+497>:  orl    %esi, 0xfc0a0(%rax,%rdx,4)
    0x7fff61caf38b <+504>:  movq   %r13, %rax
    0x7fff61caf38e <+507>:  addq   $0x8, %rsp
    0x7fff61caf392 <+511>:  popq   %rbx
    0x7fff61caf393 <+512>:  popq   %r12
    0x7fff61caf395 <+514>:  popq   %r13
    0x7fff61caf397 <+516>:  popq   %r14
    0x7fff61caf399 <+518>:  popq   %r15
    0x7fff61caf39b <+520>:  popq   %rbp
    0x7fff61caf39c <+521>:  retq   
    0x7fff61caf39d <+522>:  movq   0x210(%r14), %r13
    0x7fff61caf3a4 <+529>:  testq  %r13, %r13
    0x7fff61caf3a7 <+532>:  je     0x7fff61caf4f9            ; <+870>
    0x7fff61caf3ad <+538>:  movq   %r13, %rdi
    0x7fff61caf3b0 <+541>:  callq  0x7fff61cb05ab            ; get_tiny_free_size
    0x7fff61caf3b5 <+546>:  movq   0x8(%r13), %rcx
    0x7fff61caf3b9 <+550>:  movq   %rcx, %r8
    0x7fff61caf3bc <+553>:  shlq   $0x4, %r8
    0x7fff61caf3c0 <+557>:  shrq   $0x3c, %rcx
    0x7fff61caf3c4 <+561>:  movq   0x278(%r12), %rsi
    0x7fff61caf3cc <+569>:  movq   %r8, %rdi
    0x7fff61caf3cf <+572>:  xorq   %rsi, %rdi
    0x7fff61caf3d2 <+575>:  movq   %rdi, %rbx
    0x7fff61caf3d5 <+578>:  shrq   $0x8, %rbx
    0x7fff61caf3d9 <+582>:  addl   %edi, %ebx
    0x7fff61caf3db <+584>:  movq   %rdi, %rdx
    0x7fff61caf3de <+587>:  shrq   $0x10, %rdx
    0x7fff61caf3e2 <+591>:  addl   %ebx, %edx
    0x7fff61caf3e4 <+593>:  movq   %rdi, %rbx
    0x7fff61caf3e7 <+596>:  shrq   $0x18, %rbx
    0x7fff61caf3eb <+600>:  addl   %edx, %ebx
    0x7fff61caf3ed <+602>:  movq   %rdi, %rdx
    0x7fff61caf3f0 <+605>:  shrq   $0x20, %rdx
    0x7fff61caf3f4 <+609>:  addl   %ebx, %edx
    0x7fff61caf3f6 <+611>:  movq   %rdi, %rbx
    0x7fff61caf3f9 <+614>:  shrq   $0x28, %rbx
    0x7fff61caf3fd <+618>:  addl   %edx, %ebx
    0x7fff61caf3ff <+620>:  movq   %rdi, %rdx
    0x7fff61caf402 <+623>:  shrq   $0x30, %rdx
    0x7fff61caf406 <+627>:  addl   %ebx, %edx
    0x7fff61caf408 <+629>:  shrq   $0x38, %rdi
    0x7fff61caf40c <+633>:  addl   %edx, %edi
    0x7fff61caf40e <+635>:  andl   $0xf, %edi
    0x7fff61caf411 <+638>:  cmpq   %rdi, %rcx
    0x7fff61caf414 <+641>:  jne    0x7fff61caf602            ; <+1135>
    0x7fff61caf41a <+647>:  movzwl %ax, %edi
    0x7fff61caf41d <+650>:  subl   %r15d, %edi
    0x7fff61caf420 <+653>:  cmpl   $0x40, %edi
    0x7fff61caf423 <+656>:  jl     0x7fff61caf58a            ; <+1015>
    0x7fff61caf429 <+662>:  movl   %r15d, %eax
    0x7fff61caf42c <+665>:  shll   $0x4, %eax
    0x7fff61caf42f <+668>:  addq   %r13, %rax
    0x7fff61caf432 <+671>:  movq   %rax, 0x210(%r14)
    0x7fff61caf439 <+678>:  movq   %rax, %rcx
    0x7fff61caf43c <+681>:  shrq   $0x4, %rcx
    0x7fff61caf440 <+685>:  testq  %r8, %r8
    0x7fff61caf443 <+688>:  je     0x7fff61caf48e            ; <+763>
    0x7fff61caf445 <+690>:  xorq   %rax, %rsi
    0x7fff61caf448 <+693>:  movq   %rsi, %rdx
    0x7fff61caf44b <+696>:  shrq   $0x8, %rdx
    0x7fff61caf44f <+700>:  addl   %esi, %edx
    0x7fff61caf451 <+702>:  movq   %rsi, %rbx
    0x7fff61caf454 <+705>:  shrq   $0x10, %rbx
    0x7fff61caf458 <+709>:  addl   %edx, %ebx
    0x7fff61caf45a <+711>:  movq   %rsi, %rdx
    0x7fff61caf45d <+714>:  shrq   $0x18, %rdx
    0x7fff61caf461 <+718>:  addl   %ebx, %edx
    0x7fff61caf463 <+720>:  movq   %rsi, %rbx
    0x7fff61caf466 <+723>:  shrq   $0x20, %rbx
    0x7fff61caf46a <+727>:  addl   %edx, %ebx
    0x7fff61caf46c <+729>:  movq   %rsi, %rdx
    0x7fff61caf46f <+732>:  shrq   $0x28, %rdx
    0x7fff61caf473 <+736>:  addl   %ebx, %edx
    0x7fff61caf475 <+738>:  movq   %rsi, %rbx
    0x7fff61caf478 <+741>:  shrq   $0x30, %rbx
    0x7fff61caf47c <+745>:  addl   %edx, %ebx
    0x7fff61caf47e <+747>:  shrq   $0x38, %rsi
    0x7fff61caf482 <+751>:  addl   %ebx, %esi
    0x7fff61caf484 <+753>:  shlq   $0x3c, %rsi
    0x7fff61caf488 <+757>:  orq    %rcx, %rsi
    0x7fff61caf48b <+760>:  movq   %rsi, (%r8)
    0x7fff61caf48e <+763>:  movq   (%r13), %rdx
    0x7fff61caf492 <+767>:  movq   %rdx, (%rax)
    0x7fff61caf495 <+770>:  movq   0x8(%r13), %rdx
    0x7fff61caf499 <+774>:  movq   %rdx, 0x8(%rax)
    0x7fff61caf49d <+778>:  movq   %rax, %rdx
    0x7fff61caf4a0 <+781>:  andq   $-0x100000, %rdx          ; imm = 0xFFF00000 
    0x7fff61caf4a7 <+788>:  movl   %eax, %esi
    0x7fff61caf4a9 <+790>:  shrl   $0x8, %esi
    0x7fff61caf4ac <+793>:  andl   $0xffe, %esi              ; imm = 0xFFE 
    0x7fff61caf4b2 <+799>:  andl   $0x1f, %ecx
    0x7fff61caf4b5 <+802>:  movl   $0x1, %ebx
    0x7fff61caf4ba <+807>:  shll   %cl, %ebx
    0x7fff61caf4bc <+809>:  orl    %ebx, 0xfc0a0(%rdx,%rsi,4)
    0x7fff61caf4c3 <+816>:  movl   $0xfffffffe, %ebx         ; imm = 0xFFFFFFFE 
    0x7fff61caf4c8 <+821>:  roll   %cl, %ebx
    0x7fff61caf4ca <+823>:  orl    $0x1, %esi
    0x7fff61caf4cd <+826>:  andl   %ebx, 0xfc0a0(%rdx,%rsi,4)
    0x7fff61caf4d4 <+833>:  movzwl %di, %ecx
    0x7fff61caf4d7 <+836>:  cmpl   $0x2, %ecx
    0x7fff61caf4da <+839>:  jb     0x7fff61caf5ee            ; <+1115>
    0x7fff61caf4e0 <+845>:  movl   %edi, %ecx
    0x7fff61caf4e2 <+847>:  shll   $0x4, %ecx
    0x7fff61caf4e5 <+850>:  andl   $0xffff0, %ecx            ; imm = 0xFFFF0 
    0x7fff61caf4eb <+856>:  movw   %di, -0x2(%rax,%rcx)
    0x7fff61caf4f0 <+861>:  movw   %di, 0x10(%rax)
    0x7fff61caf4f4 <+865>:  jmp    0x7fff61caf2f7            ; <+356>
    0x7fff61caf4f9 <+870>:  movq   0x838(%r14), %rcx
    0x7fff61caf500 <+877>:  movl   %r15d, %eax
    0x7fff61caf503 <+880>:  shll   $0x4, %eax
    0x7fff61caf506 <+883>:  movq   %rcx, %rdx
    0x7fff61caf509 <+886>:  subq   %rax, %rdx
    0x7fff61caf50c <+889>:  jae    0x7fff61caf516            ; <+899>
    0x7fff61caf50e <+891>:  xorl   %r13d, %r13d
    0x7fff61caf511 <+894>:  jmp    0x7fff61caf38b            ; <+504>
    0x7fff61caf516 <+899>:  movl   $0xfc080, %r13d           ; imm = 0xFC080 
    0x7fff61caf51c <+905>:  subq   %rcx, %r13
    0x7fff61caf51f <+908>:  addq   0x848(%r14), %r13
    0x7fff61caf526 <+915>:  movq   %rdx, 0x838(%r14)
    0x7fff61caf52d <+922>:  testq  %rdx, %rdx
    0x7fff61caf530 <+925>:  je     0x7fff61caf2f7            ; <+356>
    0x7fff61caf536 <+931>:  addq   %r13, %rax
    0x7fff61caf539 <+934>:  movq   %rax, %rdx
    0x7fff61caf53c <+937>:  andq   $-0x100000, %rdx          ; imm = 0xFFF00000 
    0x7fff61caf543 <+944>:  movq   %rax, %rcx
    0x7fff61caf546 <+947>:  shrq   $0x4, %rcx
    0x7fff61caf54a <+951>:  shrl   $0x8, %eax
    0x7fff61caf54d <+954>:  andl   $0xffe, %eax              ; imm = 0xFFE 
    0x7fff61caf552 <+959>:  movl   $0x1, %esi
    0x7fff61caf557 <+964>:  movl   $0x1, %edi
    0x7fff61caf55c <+969>:  shll   %cl, %edi
    0x7fff61caf55e <+971>:  orl    %edi, 0xfc0a0(%rdx,%rax,4)
    0x7fff61caf565 <+978>:  orl    $0x1, %eax
    0x7fff61caf568 <+981>:  orl    %edi, 0xfc0a0(%rdx,%rax,4)
    0x7fff61caf56f <+988>:  leal   0x1(%rcx), %ecx
    0x7fff61caf572 <+991>:  movl   %ecx, %eax
    0x7fff61caf574 <+993>:  shrl   $0x4, %eax
    0x7fff61caf577 <+996>:  andl   $0xffe, %eax              ; imm = 0xFFE 
    0x7fff61caf57c <+1001>: shll   %cl, %esi
    0x7fff61caf57e <+1003>: orl    %esi, 0xfc0a0(%rdx,%rax,4)
    0x7fff61caf585 <+1010>: jmp    0x7fff61caf2f7            ; <+356>
    0x7fff61caf58a <+1015>: testq  %r8, %r8
    0x7fff61caf58d <+1018>: je     0x7fff61caf596            ; <+1027>
    0x7fff61caf58f <+1020>: movq   (%r13), %rcx
    0x7fff61caf593 <+1024>: movq   %rcx, (%r8)
    0x7fff61caf596 <+1027>: movq   %r8, 0x210(%r14)
    0x7fff61caf59d <+1034>: jmp    0x7fff61caf5ba            ; <+1063>
    0x7fff61caf59f <+1036>: movl   $0xfffffffe, %eax         ; imm = 0xFFFFFFFE 
    0x7fff61caf5a4 <+1041>: roll   %cl, %eax
    0x7fff61caf5a6 <+1043>: shrq   $0x5, %rcx
    0x7fff61caf5aa <+1047>: andl   %eax, 0x818(%r14,%rcx,4)
    0x7fff61caf5b2 <+1055>: movq   %r13, %rdi
    0x7fff61caf5b5 <+1058>: callq  0x7fff61cb05ab            ; get_tiny_free_size
    0x7fff61caf5ba <+1063>: leal   -0x1(%rax), %ecx
    0x7fff61caf5bd <+1066>: cmpw   %r15w, %cx
    0x7fff61caf5c1 <+1070>: jae    0x7fff61caf5cc            ; <+1081>
    0x7fff61caf5c3 <+1072>: movw   %ax, %r15w
    0x7fff61caf5c7 <+1076>: jmp    0x7fff61caf2f7            ; <+356>
    0x7fff61caf5cc <+1081>: subl   %r15d, %eax
    0x7fff61caf5cf <+1084>: movl   %r15d, %ecx
    0x7fff61caf5d2 <+1087>: shll   $0x4, %ecx
    0x7fff61caf5d5 <+1090>: movq   %r13, %rdx
    0x7fff61caf5d8 <+1093>: addq   %rcx, %rdx
    0x7fff61caf5db <+1096>: movzwl %ax, %ecx
    0x7fff61caf5de <+1099>: movq   %r12, %rdi
    0x7fff61caf5e1 <+1102>: movq   %r14, %rsi
    0x7fff61caf5e4 <+1105>: callq  0x7fff61cafe1b            ; tiny_free_list_add_ptr
    0x7fff61caf5e9 <+1110>: jmp    0x7fff61caf2f7            ; <+356>
    0x7fff61caf5ee <+1115>: testw  %di, %di
    0x7fff61caf5f1 <+1118>: jne    0x7fff61caf2f7            ; <+356>
    0x7fff61caf5f7 <+1124>: movw   $0x0, 0x10(%rax)
    0x7fff61caf5fd <+1130>: jmp    0x7fff61caf2f7            ; <+356>
    0x7fff61caf602 <+1135>: addq   $0x8, %r13
    0x7fff61caf606 <+1139>: movl   0x26c(%r12), %edi
    0x7fff61caf60e <+1147>: movq   %r13, %rsi
    0x7fff61caf611 <+1150>: callq  0x7fff61cc5505            ; free_list_checksum_botch.352
    0x7fff61caf616 <+1155>: ud2    
    0x7fff61caf618 <+1157>: nop    

The offender is 0x7fff61caf227 <+148>: movq %rdx, (%rax), where rax is the address of my weights pointer that gets nulled, and rdx is 0.

Amadan
  • 191,408
  • 23
  • 240
  • 301
  • You should provide a [mcve]. – Ulrich Eckhardt Aug 06 '18 at 04:23
  • 5
    @UlrichEckhardt: Look at my rep, do you think I haven't tried? If I could, I would. I ran the function where the segfault occurs in an infinite loop for a long time - no error. The example turned out not to be an example. But as I said the problem is intermittent, even I can't consistently generate it on the code I have. If I could consistently replicate it in a controlled environment, it might be I wouldn't need to post a question at all. I'm looking for advice from experts in assembly on how to find the problem in the first place and how to explain the results my experiments so far produced. – Amadan Aug 06 '18 at 04:37
  • Try running your code through a memory debugger (e.g. valgrind) – Leon Aug 06 '18 at 07:43
  • @Leon: Thanks for the suggestion; I've never used that before. Not sure if I'm using it correctly, as everything I tried seems to have memory issues (`valgrind --leak-check=yes python -c "print('hello')"`: 1012 errors from 124 contexts; even 2 errors from a simple `/bin/echo`!). My code dies significantly faster than usual (despite valgrind claiming it should be ~40x slower), to Segfault 11, apparently after an illegal read (which doesn't seem to match what my previous tests showed)... Sigh... – Amadan Aug 06 '18 at 12:13
  • @Amadan Don't use the `--leak-check` option while you are dealing with a more serious problem like a crash. And yes, programs that have not been specifically tested for memory errors may contain hidden bugs (though some false positives are also possible). – Leon Aug 06 '18 at 13:13
  • @Amadan if you're going to valgrind python you're either going to need a a good valgrind suppression file / a custom build python or if you python is new enough you can est an appropriate environment variable to make python valgrind friendly - https://stackoverflow.com/questions/20112989/how-to-use-valgrind-with-python . – Anon Aug 12 '18 at 06:35
  • @Anon: Thank you for the link, that looks promising. I'd kind of given up as none of my explorations yielded anything of value and I couldn't justify the time I spent on it any more... but that link seems really useful. – Amadan Aug 14 '18 at 04:32
  • It's difficult to answer without your python code and the dataset you use, but I can make some assumptions. I have noticed some strange related behavior in parsing buffers from javascript to c++. It might be the same things here. So can you verify that: 1. Your dataset contains c++ escape sequence https://en.cppreference.com/w/cpp/language/escape and more particulary this one: `\0` 2. The pointer nullified is not always the same If this two assumptions are true, I may explain what could happen here. – Charlie Lutaud Sep 13 '18 at 12:22
  • @CharlieLutaud: The pointer is (almost?) always the same. I don't have any guarantees on the data, but AFAIK there's no funny business there. – Amadan Sep 13 '18 at 12:25
  • @Amadan ok, so i have no more than few assumptions that could explain what happen, but it can be totally wrong – Charlie Lutaud Sep 13 '18 at 12:31

1 Answers1

1

That should be:

  1. An escape sequence \0 is present in the dataset. The size of the string is calculated with python:

    >>> len("a\0aa")
    4
    
  2. Then, the string is passed to c++ (CharniakParser) and we loop through it to parse it:

    string a = "a\0aa";
    const char* b = a.c_str();
    cout << a.size() << endl; // size == 1, \0 is the end of a string
    for(size_t i=0; i<4; i++)// 4 is the string size calculated with python
    {
        const char* c = &b[i];
        do_something_with(c); // c is corrupt after i == 0
    }
    
  3. Because you are close to be out of memory, you corrupted pointer c shall points to a not null address (the assumption of not always the same pointer nullified), the object pointed is removed by an operation, and so in an other portion of your code, you have a pointer pointing to that object that is removed --> this is your bug

I realize that my explanation is far-fetched, but we haven't that much datas.

Charlie Lutaud
  • 503
  • 5
  • 10
  • Sorry, I'm not great at C++; can you explain again why `c` would be corrupted? Is `&b[i]` not equivalent to `b+i`, pointing at the zero byte? And how does that scenario apply to `weights`, which as `std::vector` should presumably be allowed to have a null byte in it? (as said in the question, `weights` is the vector that gets nuked.) – Amadan Sep 13 '18 at 13:14
  • it's because we iterate (4 time) that is more than the size of `c` (which is 1), in c++ with a c like array (like `char* []`), if you do that, you go through your memory cell by cell, so you outreach the last cell of your array --> in this case your are manipulating a totally other object in memory (that is almost random, it could be null, or something totally unrelated with the initial array (`c`), in this case it should be your `weights`, but it should be an other thing, it is why I asked you if it is always exactly the same line who crash ) – Charlie Lutaud Sep 13 '18 at 13:25
  • But even if string is `"a\0aa"`, the memory allocated for `a` is 4 bytes, so you don't actually address outside the memory allocated for `a`. – Amadan Sep 13 '18 at 13:28
  • yes, but it seems that CharniakParser is wrote in cython (`_CharniakParser.cpython-36m-darwin.so`), so a bug of a size of a string can be introduced there, "a\0aa" 4 character for python, 1 caharcter for c++ – Charlie Lutaud Sep 13 '18 at 13:30
  • (it's just a big assumption, like i sais, i can be totally wrong) – Charlie Lutaud Sep 13 '18 at 13:35
  • Sorry, I have made a mistake, if you allocate a string with ("a\0aa"), the memory allocated is 1 byte (it's like if you split the string before "\0") – Charlie Lutaud Sep 13 '18 at 13:44
  • Thanks, that sounds plausible. Unfortunately I just started vacation and won’t be able to test for at least a week :( – Amadan Sep 13 '18 at 13:52