0

Reading a 2 column text file and storing long int values into an array that is reallocated dynamically, fails when then array grows to over 200 thousand memory reallocations.

    long int load_list_unklen(long int (**ptr), FILE *infname)
    {   long int row =0;
        char buf[64];
        while (fgets (buf, sizeof(buf), infname)) {
            // M is defined to be = 2
            *ptr = (long int *)realloc(*ptr, M * row+1 * sizeof(ptr));
            if(!ptr) {
                perror("Out of memory");
                free(ptr);
                exit( 1); }
            sscanf(buf, "%ld\t%ld", &(*ptr)[row*2],&(*ptr)[row*2+1]);
            row += 1;
        }
        if (ferror(infname)) {
            fprintf(stderr,"Oops, error reading stdin\n");
            abort();
        }
        return  row;
    }

Notice that buf gets a string that has two numbers separated by a tab. The code fails as it tries load a file with over 2mil lines and row increments stop around 221181, thus I wonder if this there a limit where realloc chokes? Should I be calling realloc differently?

Here is how I call the function:

long int *act_nei = (long int *)malloc(M * sizeof (act_nei) );
const long int sz  = load_list_unklen(&act_nei, fp);

Using code from a SO post to realloc memory slots, where my example is for large input files.
Realloc and sscanf into a functionRealloc and scanf

Community
  • 1
  • 1
sAguinaga
  • 638
  • 13
  • 31
  • 3
    Please [see why not to cast](http://stackoverflow.com/q/605845/2173917) the return value of `malloc()` and family in `C`. – Sourav Ghosh Jul 15 '15 at 10:52
  • 1
    It is implementation (compiler & operating system & computer) specific. On Linux, read about [setrlimit(2)](http://man7.org/linux/man-pages/man2/setrlimit.2.html) to change that limit (e.g. `ulimit` in your shell). Use [valgrind](http://valgrind.org/) to hunt memory leaks – Basile Starynkevitch Jul 15 '15 at 10:56
  • 1
    If `realloc` fails to allocate memory, it will return `NULL` as you know, but since you assign to the same pointer you try to allocate you will loose the original pointer, and your call to `free` tries to free `NULL`. – Some programmer dude Jul 15 '15 at 10:59
  • 4
    `realloc(*ptr, M * row+1 * sizeof(ptr));` the size calculation is wrong; it is in units of `long int**`, should be in units of `long int` Easiest syntax is :: `*ptr = realloc(*ptr, M * row+1 * sizeof **ptr);` – wildplasser Jul 15 '15 at 10:59
  • 1
    Regarding the comment by @wildplasser, the original `malloc` size calculation is also wrong for the same reason. – Some programmer dude Jul 15 '15 at 11:00
  • (this is not directly your issue, but) check one of the answers here how to use realloc properly, http://stackoverflow.com/questions/9071566/is-it-safe-to-use-realloc – Giorgi Moniava Jul 15 '15 at 11:02
  • Finally, if you want to store two values then why don't you use an array of arrays (akin to `int act_nei[X][2]`) (which you can allocate dynamically as well if you just think a little). Or why not an array of structures, where each structure contains the two values? – Some programmer dude Jul 15 '15 at 11:02
  • @SouravGhosh: Totally irrelevant to the question, quite useless for anyone with a modern compiler, and a total pain in the arse if you want to run your code as C++ code. – gnasher729 Jul 15 '15 at 11:16
  • @gnasher729 But This isn't C++, this is C, and you shouldn't cast to or from a `void *` in C. And yes it might not be relevant for this question, but casting pointers is almost never the correct solution to anything, and can instead hide other errors or problems that might lead to undefined behavior. – Some programmer dude Jul 15 '15 at 11:19
  • Will consider these comments & suggestions, fix my code, and try again! – sAguinaga Jul 15 '15 at 11:39

4 Answers4

1

You are corrupting the malloc ring by writing beyond the allocated space. There is a missing () and wrong sizeof. Try:

*ptr = (long int *)realloc(*ptr, M * (row+1) * sizeof(**ptr));

meuh
  • 11,500
  • 2
  • 29
  • 45
1

Your calculation of memory usage is very wrong.

First, make your life a lot easier by not using *ptr all the time, but using a local variable. Like

long f (int* *pptr...)
{
    int* ptr = *pptr;
    ...
    *pptr = ptr;
}

Now it becomes obvious that your sizeof (ptr) is wrong. That's the size of an int** which is totally irrelevant. What you want is the size of an int, which would be sizeof(**ptr) in your code, or sizeof (*ptr) if you use a local variable for the pointer.

Of course where things are totally wrong:

M * row+1 * sizeof(ptr)

This calculates M*row, and then it adds 1*sizeof(ptr). You allocate nowhere near enough memory, so your application will crash soon.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
0

A part from the comments already posted re the incorrect use of sizeof() I have a more general comment. I would avoid reallocating the buffer for each line you read.

The semantic of realloc is such that it will return a contiguous area of memory of the requested size. On some OS, when possible, this will be done extending the buffer passed as argument "in place". In other OS, or if there is no free space at the end of the passed buffer, a completely new buffer will be allocated and the content of the original buffer copied to it. If your memory is particularly fragmented, this may result into multiple mallocs and memcpy under the hood.

The typical approach in this case consist in allocating a buffer larger than you need, and keep track internally of how much of it you've already consumed through an integer variable. Whenever you run out of space in your internal buffer, you call realloc adding another chunk of empty space. Some implementation add a constant-size chunk, other double the size of the current chunk, that's up to you and depends on various considerations.

Another problem may arise from the fact that realloc return contiguous memory. Depending on OS and system characteristics, you may not have a free region of contiguous memory of the specified size, but there may be multiple regions free of smaller size. If such scenario is a possibility, one has to consider whether a contiguous buffer is really required. For example, a solution still providing O(1) access time (although not as fast as contiguous buffer) might be to create a first buffer containing pointers to fixed size pages. In this case, what would be addressed as ptr[n] in the contiguous case would become ptr[n / PAGE_SIZE][n % PAGE_SIZE] in the new scheme.

Roberto Attias
  • 1,883
  • 1
  • 11
  • 21
  • I am writing `C` code to process tens of millions of graph vertices and edges, so performance at scale is a concern. This program is being prototyped on a Mac Book Pro, but it will run on a cluster machine (56 cores, 242 GB of ram). Should I have to consider what you recommend on the last 2 lines knowing what I just said? Just wondering what the break point is before I have to consider more optimization. – sAguinaga Jul 15 '15 at 16:02
0

The passing a pointer-to-pointer and returning-the-element-count idiom can be avoided by using a wrapper struct containing both the element count and the buffer. (the caller could discard the wrapper and only use the buffer if he wanted to)

#define handle_error() do{ perror("malloc"); exit(EXIT_FAILURE); } while (0)

struct pairs {
        unsigned size;
        unsigned used;
        struct { long int x; long int y; } *data;
        };

struct pairs *readpairs(FILE *fp)
{
struct pairs *ret;
char *line[80];

ret = malloc (sizeof *ret);
if (!ret) { handle_error();}

ret->size = ret->used = 0;
ret->data = NULL;

while (fgets(line, sizeof line, fp)) {
        int rc;
      if (ret->used >= ret->size) { // resize needed
                void *newp;
                unsigned newsiz;
                newsiz = ret->size ? 2* ret->size : 8;

                newp = realloc(ret->data,  newsiz * sizeof *ret->data);
                if (!newp) { handle_error();}
                ret->data = newp;
                ret->size = newsiz;
                }
        rc = sscanf(line,"%ld\t\%ld"
                , &ret->data[ret->used].x
                , &ret->data[ret->used].y
                );
        if (rc != 2) { handle_error();}
        ret->used += 1;
        }
return ret;
}
wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • IMO this is fancy, have not mastered the use of structures, but I need to. Will implement this ver. too. – sAguinaga Jul 15 '15 at 16:06