1

What are the possible cases that can make the following code to execute the if condition in the following snippet? As far as I'm concerned, I can't relate any cause for the if statement to execute.

#include <stdio.h>
#include <stdlib.h>
void main(void){
int Nod = 1024 * 8; //Nod contains the number of nodes
double *MM; //MM is a square matrix it can contain very large number of data 10^10
MM = calloc(8 * Nod * 8 * Nod, sizeof(double));
if (MM == NULL)exit(0);
//then MM will then be passed to some other functions say
eigenvalue(MM);}

I'm working with a FEM code that has this check in the middle of a very large program. The interesting fact is when I run the code, it shows anomalous behavior. Sometimes the program stops just here. Sometimes it just works fine. One thing that is worthy to be mentioned that is when the program is run with coarse mesh i.e. when Nod has less number of nodes to calculate, the program just works fine. But when a fine mesh is used, the program crashes unfortunately. This program is run in a mini workstation which have 128GB Ram. The program occupies 1GB (or so) of RAM.

Ahmed
  • 147
  • 7
  • 2
    That particular program always returns with a zero status, regardless of branch taken. The main function (and only the main function) has an implicit `return 0;` when execution reaches its closing bracket and it returns. – StoryTeller - Unslander Monica Mar 13 '17 at 09:44
  • 4
    `8 * Nod * 8 * Nod` is 2³² and so integer overflow. Use a larger type than `int`. – mch Mar 13 '17 at 09:45
  • 1
    read the man page for calloc and possible return values. – Sourav Ghosh Mar 13 '17 at 09:46
  • If you are working with a sparse matrix, you can just store the cells that are filled on a map or unordered map. No need to allocate huge amounts of memory. – doron Mar 13 '17 at 09:56
  • Larger type for whom? `Nod`? `Nod` denotes only how much nodes to calculate and this is well within the reach of `int` data type. – Ahmed Mar 13 '17 at 10:38

2 Answers2

6

Two obvious problems:

  1. The computation 8 * Nod * 8 * Nod will be of type int, which might not be big enough (on your platform) to hold the result. You probably want size_t Nod instead. And you might want to check for overflow (perhaps with platform-specific functions such as GCC's __builtin_mul_overflow()) if the values are not constant.
  2. You use the result of calloc() without checking that it's not NULL. If the allocator can't find a big enough contiguous block, it will fail, and you should test for that before continuing.

Never ignore the return value from library functions that use it to report errors.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
  • if it was just a datatype overflow would it no be failing every time ? – louigi600 Mar 13 '17 at 10:58
  • @louigi600: it depends. A signed overflow is **undefined**, so you can't depend on anything. In the "best" case (i.e. worst case), you'll end up with a value, which when converted to `size_t` is big enough for the data. If you're lucky, you'll get an immediate fault or an unsatisfiable value, to crash your program and encourage some debugging. – Toby Speight Mar 13 '17 at 11:42
  • @Tony S. ok but even so how is he allocating the huge 32Gb block on a 4Gb system with insufficient virtual memory to do that ? I checked on a 32 bit machine (with a exit(1) if the malloc fails) and it's it's exiting with 0 ... how on earth is that happening ? I'm puzzeled ! – louigi600 Mar 13 '17 at 11:46
  • On a PAE less 32 bit machine the virtual memory address space can't be bigger then 4Gb ... that allocates a hypothetical 32Gb ... something else is going wrong ! – louigi600 Mar 13 '17 at 11:58
  • @luigi, that depends a lot on your platform. For example, most Linux installations will over-commit, so the allocation appears to succeed, but will fault if/when you attempt to access all of it. It can be worth setting the appropriate tunables if this matters (I've forgotten how to do this; I think there's a sysctl to set the global values, but there might be a per-process setting too). – Toby Speight Mar 13 '17 at 12:08
  • I tried "echo 2 > /proc/sys/vm/overcommit_memory" but it still exits with 0 . It;s still crazy how that can happen on a 32 bit PAE less system ! – louigi600 Mar 13 '17 at 12:53
0

From the man page:

   The malloc() and calloc() functions return a pointer to the allocated  memory  that
   is  suitably  aligned  for  any kind of variable.  On error, these functions return
   NULL.  NULL may also be returned by a successful call to malloc() with  a  size  of
   zero, or by a successful call to calloc() with nmemb or size equal to zero.e here

Now in your case it's not dew to allocating zero sized memory so the only other reason for returning NULL is failure to allocate the memory. In the snippet you show you are allocating 4294967296 elements (1024 * 1024 * 64 * 64) the size of a double (8 bytes) that's 32Gb of ram. Now your system definitely has that amount of ram but at any given time it may not have it all in a consecutive allocable block so calloc may be failing for that reason.

Another thing to watch out for is memory overcommitment governed primarily by

/proc/sys/vm/overcommit_memory
or vis sysctl  vm.overcommit_memory

by default overcommit_memory is set to 0 but possibly the safest combination would be setting it to 2. See proc man page or kernel Documentation/vm/overcommit-accounting for more detail on this.

vm.overcommit_ratio
vm.overcommit_kbytes
vm.nr_overcommit_hugepages

are also other sysctl settings that govern if/how your system will deal with memory overcommitment.

Even at this I've done my best not to allow overcommitment of memory on a 32 bit linux machine but I was still able to get the huge 32Gb callot not to return null (which I regard as being strange on it's own as a PAE less 32 bit machine can only address a total of 4Gb of virtual memory, and even if it had PAE it would only allow addressing 4Gb at a time).

louigi600
  • 716
  • 6
  • 16
  • Just to mention, I'm testing the program just right now in my PC which has only 4GB RAM. Just when I posted the question, the program was seen to fail in the stated line. But Interestingly the program is running right now...! and with much dense mesh...! I just want to have a hint... what is causing this anomalous behaviour. – Ahmed Mar 13 '17 at 10:27
  • Essentially I wrote much the same suggestions as the second point made by Toby Speight ... just a few more words and numbers to show how I came to the 32Gb. It is strange to me that on a 4Gb ram machine you are able to allocate 32Gb object ... even if the size of long was to be one byte (and it's definitely not) it would still be allocating 4Gb ... on e 4Gb machine with a live os I find it hard to believe you can allocate 4Gb . Maybe it's vitual memory : how much swap do you have on the 4Gb machine ? – louigi600 Mar 13 '17 at 10:50
  • @AhmedAfifKhan That behavior is in no way "anomalous" - You are actually trying to allocate an enormous amount of memory that might or might not be available at certain times - And as you don't even check for the return value *directly* after the allocation, your program fails. – tofro Mar 13 '17 at 10:51
  • Paging = 1280MB. It's running...! Well I'm also perplexed why it's running on my machine...! I've just mentioned only one variable here just for illustration. There are some several HUGE `double` sized arrays declared...! – Ahmed Mar 13 '17 at 11:00
  • Just for curiosity : what aye you working on analyzing big data for AI ? If you turn off swap on the 4Gb machine does it still work ? – louigi600 Mar 13 '17 at 11:16
  • *** NO. It's an FEM program with huge sparse matrices. I don't have enough time to optimize the code. *** I don't think it needs the paging memory at all. It's using some 900MB of RAM space only. *** It's working today... but may not work tomorrow... this is where I'm hanging. The same code, same machine. But different outcomes at different times. – Ahmed Mar 13 '17 at 11:22
  • but your snippet is allocating a lot more then that ... and I've no idea where calloc if doing that :( I waoud expec MM = calloc(8 * Nod * 8 * Nod, sizeof(double)); to allways fain on a 4Gb machine. – louigi600 Mar 13 '17 at 11:25
  • This patially explains it http://stackoverflow.com/questions/19750796/allocating-more-memory-than-there-exists-using-malloc ... but even with vm.overcommit_memory = 2 I can still overcommit much more then SWAP + RAM(overcommit%/100) – louigi600 Mar 13 '17 at 13:52
  • @Ahmed Afif Khan Thanks for acknowledging my answer, it helps my reputation. Do you want me to include in the answer some of the messages exchanged because there's some interesting stuff concerning vm.overcommit_memory – louigi600 Mar 15 '17 at 11:49
  • Of course why not! I'd like to make a recommendation I.e. my question is more conceptual than technical. I was interested to know whether in my understanding about this problem is correct or not. Please ask for any more info if you need. – Ahmed Mar 15 '17 at 14:22