I have a C++ program that uses a shared C library (namely Darknet) to load and make use of lightweight neural networks.
The program run flawlessly under Ubuntu Trusty on x86_64 box, but crashes with segmentation fault under the same OS but on the ARM device. The reason of the crash is that calloc returns NULL during memory allocation for an array. The code looks like the following:
l.filters = calloc(c * n * size * size, sizeof(float));
...
for (i = 0; i < c * n * size * size; ++i)
l.filters[i] = scale * rand_uniform(-1, 1);
So, after trying to write the first element, the application halts with segfault.
In my case the amount of the memory to be allocated is 4.7 MB, while there is more than 1GB available. I also tried to run it after reboot to exclude the heap fragmentation, but with the same result.
What is more interesting, when I am trying to load a larger network, it works just fine. And the two networks have the same configuration of the layer for which the crash happens...
Valgrind tells me nothing new:
==2591== Invalid write of size 4
==2591== at 0x40C70: make_convolutional_layer (convolutional_layer.c:135)
==2591== by 0x2C0DF: parse_convolutional (parser.c:159)
==2591== by 0x2D7EB: parse_network_cfg (parser.c:493)
==2591== by 0xBE4D: main (annotation.cpp:58)
==2591== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==2591==
==2591==
==2591== Process terminating with default action of signal 11 (SIGSEGV)
==2591== Access not within mapped region at address 0x0
==2591== at 0x40C70: make_convolutional_layer (convolutional_layer.c:135)
==2591== by 0x2C0DF: parse_convolutional (parser.c:159)
==2591== by 0x2D7EB: parse_network_cfg (parser.c:493)
==2591== by 0xBE4D: main (annotation.cpp:58)
==2591== If you believe this happened as a result of a stack
==2591== overflow in your program's main thread (unlikely but
==2591== possible), you can try to increase the size of the
==2591== main thread stack using the --main-stacksize= flag.
==2591== The main thread stack size used in this run was 4294967295.
==2591==
==2591== HEAP SUMMARY:
==2591== in use at exit: 1,731,358,649 bytes in 2,164 blocks
==2591== total heap usage: 12,981 allocs, 10,817 frees, 9,996,704,911 bytes allocated
==2591==
==2591== LEAK SUMMARY:
==2591== definitely lost: 16,645 bytes in 21 blocks
==2591== indirectly lost: 529,234 bytes in 236 blocks
==2591== possibly lost: 1,729,206,304 bytes in 232 blocks
==2591== still reachable: 1,606,466 bytes in 1,675 blocks
==2591== suppressed: 0 bytes in 0 blocks
==2591== Rerun with --leak-check=full to see details of leaked memory
==2591==
==2591== For counts of detected and suppressed errors, rerun with: -v
==2591== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 402 from 8)
Killed
I am really confused what might be the reason. Could anybody help me?