7

Probably by the kernel as suggested in this question. I would like to see why I was killed, something like the function the assassination took place. :)

Moreover, is there anything I can do to allow my program execute normally?


Chronicle

My program executes properly. However, we encountered a big dataset, 1.000.000 x 960 floats and my laptop at home couldn't take it (gave an std::bad_alloc()).

Now, I am in the lab, in a desktop with 9.8 GiB at a processor 3.00GHz × 4, which has more than twice of the memory the laptop at home has.

At home, the data set could not be loaded in the std::vector, where the data is stored. Here, in the lab, this was accomplished and the program continued with building a data structure.

That was the last time I heard from it:

Start building...
Killed

The desktop in the lab runs on Debian 8. My program runs as expected for a subset of the data set, in particular 1.00.000 x 960 floats.


EDIT

strace output is finally available:

...
brk..
brk(0x352435000)                        = 0x352414000
mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f09c1563000
munmap(0x7f09c1563000, 44683264)        = 0
munmap(0x7f09c8000000, 22425600)        = 0
mprotect(0x7f09c4000000, 135168, PROT_READ|PROT_WRITE) = 0
...
mprotect(0x7f09c6360000, 8003584, PROT_READ|PROT_WRITE) = 0
+++ killed by SIGKILL +++

So this tells us I am out of memory, I guess.

Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • 1
    Perhaps you are allocating too much memory. – Basile Starynkevitch Apr 20 '15 at 12:07
  • It's the sad truth @Skynet. I am trying to run this dataset for days... – gsamaras Apr 20 '15 at 12:07
  • Really after seeing your question I can't stop laughing :D – Arnab Nandy Apr 20 '15 at 12:07
  • @BasileStarynkevitch I would say that this is sure the case. However I would like to know where I was killed and if there is anything I can do at this computer. – gsamaras Apr 20 '15 at 12:08
  • 2
    Compile your program with `g++ -Wall -Wextra -g`; then use the debugger `gdb` & [valgrind](http://valgrind.org/) & `strace`; but your program has a bug. Also STFW for `linux memory overcommit` – Basile Starynkevitch Apr 20 '15 at 12:08
  • A bug? @BasileStarynkevitch Why do you say this? – gsamaras Apr 20 '15 at 12:10
  • My feeling. A well behaved program should not be killed, but should check for errors and at least show an application specific error message. – Basile Starynkevitch Apr 20 '15 at 12:11
  • 1
    The reason the program is being killed instead of failing more gracefully is *overcommit*. To get a clean memory allocation failure in the program, disable overcommit (that won't make your program work, only help you understand what is happening). As for why overcommit prevents useful diagnoses, you shouldn't have difficulties finding this information now that you know the name of the “feature”. – Pascal Cuoq Apr 20 '15 at 12:11
  • @BasileStarynkevitch Overcommit makes it impossible to write “well behaved programs”. C++ does not offer any interface by which a memory write (`*p = 1;`) can signal allocation failure, and this is how programs fail in presence of overcommit. – Pascal Cuoq Apr 20 '15 at 12:15
  • Oh you mean I don't catch the error, yes I haven't implemented that yet. @PascalCuoq I found google results on how to turn it off, I would like to turn it on again. Can you please provide me with the steps? – gsamaras Apr 20 '15 at 12:15
  • @PascalCuoq. I know that and I mentionned overcommit in my comment. – Basile Starynkevitch Apr 20 '15 at 12:16
  • @BasileStarynkevitch I followed your advice and edited. – gsamaras Apr 20 '15 at 13:28
  • Do you have a swap space configured on your machines? The OOM killer shouldn't start killing programs until all RAM and swaps are exhausted. Given that there are probably some programs (or parts thereof) that just need to be resident, and not actively running, then they can just be paged in and out by the virtual memory manager (VMM). – Lie Ryan Apr 20 '15 at 13:49
  • Yeah I got the idea. Thanks everybody. – gsamaras Apr 20 '15 at 14:16

2 Answers2

5

In C++, a float is a single (32 bit) floating point number: http://en.wikipedia.org/wiki/Single-precision_floating-point_format

which means that you are allocating (without overhead) 3 840 000 000 bytes of data.

or roughly 3,57627869 gigabytes..

Lets safely assume that the header of the vector is nothing compared to the data, and continue with this number..

This is a huge amount of data to build up, Linux may assume that this is just a memoryleak, and protect it self by killing the application:

https://unix.stackexchange.com/questions/136291/will-linux-start-killing-my-processes-without-asking-me-if-memory-gets-short

I don't think this is an overcommit problem, since you are actually utillizing nearly half the memory in a single application.

but perhaps.. consider this just for fun.. are you building an 32bit application? you are getting close to the 2^32 (4Gb) memory space that can be addresssed by your program if it's a 32 bit build..

So in case you have another large vector allocated... bum bum bum

Community
  • 1
  • 1
Henrik
  • 2,180
  • 16
  • 29
  • 1
    bum³ frightens me. The number of bytes is correct. My laptop at home runs on 32 bits. Do you think that there is some connection? +1 for the nice answer. – gsamaras Apr 20 '15 at 12:45
  • I'm not sure, but it's my first suspicion. Consider testing your program on a smaller dataset first, (like half the size) and if it runs, then I'll put my money that the application is unable to allocate the necessesary memory, and is stopped from wrapping its memory space by the kernel.. – Henrik Apr 20 '15 at 12:50
  • I am not sure if you answered my 32 bits question. See my edit for the smaller data set. – gsamaras Apr 20 '15 at 12:56
  • std::vector, unless you reserve the correct amount of space, when you are pushing items into it, will double its memory footprint each time it needs to grow. eg 16 items to 32 items to 64 items to ... so it might be trying to grow well past the size you need. Try to use 'reserve' to get exactly how many items your vector needs. – LawfulEvil Apr 20 '15 at 13:06
  • I am now quite sure that you simply cannot allocate the memory you need, and the probable explanation is that you are trying to allocate more than you are allowed to with a 32bit application. Regarding a solution, well, LawfulEvil, knows his stuff, - have you reserved space for the data prior to allocation? otherwise that is your solution! - otherwise consider, if its absolute necessesary to hold the entire dataset in memory at once, or perhaps if possible load a subset, do calculations, unload, load the next subset, and so on and so forth.. – Henrik Apr 20 '15 at 13:23
  • @LawfulEvil `reserve()` is not a good choice here, better suggest `resize()` to get exactly the memory you require. `reserve()` may waste some space I think. I compiled the code in the lab pc, which is 64bits Henrik. I will check what was suggested. – gsamaras Apr 20 '15 at 13:31
0

First install the signal handler for example

static bool installSignalHandler(int sigNumber, void (*handler)(int) = signal_handler)
{
    struct sigaction action;
    memset(&action, 0, sizeof(action));
    action.sa_flags = SA_SIGINFO;
    action.sa_sigaction = signal_handler_action;
    return !sigaction(sigNumber, &action, NULL);
}

Call it:

installSignalHandler(SIGINT);
installSignalHandler(SIGTERM);

And the next code will be executed:

static void signal_handler_action(int sig, siginfo_t *siginfo, void* content) 
{
    switch(sig) {
        case SIGHUP:
            break;
        case SIGUSR1:
            break;
        case SIGTERM:
            break;
        case SIGINT:
            break;
        case SIGPIPE:
            return;
    }
}

Take a look at the siginfo_t structure for the data you want

printf("Continue. Signo: %d - code: %d - value: %d - errno: %d - pid: %ld - uid: %ld - addr %p - status %d - band %d",
                      siginfo->si_signo, siginfo->si_code, siginfo->si_value, siginfo->si_errno, siginfo->si_pid, siginfo->si_uid, siginfo->si_addr,
                      siginfo->si_status, siginfo->si_band);
Jose Palma
  • 756
  • 6
  • 13
  • No, that is the wrong approach. BTW, [signal(7)](http://man7.org/linux/man-pages/man7/signal.7.html) forbids to call `printf` from inside a signal handler. – Basile Starynkevitch Apr 20 '15 at 12:12
  • I have added printf because I can't put the code I use in productive xD, use syslog – Jose Palma Apr 20 '15 at 12:13
  • 1
    Neither `printf` nor `syslog` are *async-signal-safe-functions* so both are forbidden inside a signal handler – Basile Starynkevitch Apr 20 '15 at 12:14
  • I am afraid @BasileStarynkevitch is correct from what we done in uni. – gsamaras Apr 20 '15 at 12:16
  • Those functions are not forbidden but not recommended, as they can throw a new signal and your code will hang or behave in undefined way. Sometime we need to use the right words :-). In any case, you are probably running out of memory and the kernel is killing your program with a sigkill/sigterm. You can check dmesg or run the program with `strace` – Jose Palma Apr 20 '15 at 12:21
  • @Raistmaj I will run `strace` now. :) – gsamaras Apr 20 '15 at 12:25