0

When reading in data to a string, is it better to use as many realloc's as necessary to get the correct amount of memory, or is it better to use less realloc's but potentially end up with unused memory?

For instance, is this better:

char c;
int count = 1;
char* s = malloc(sizeof(char));
while((c = fgetc(file)) != EOF) {
    char* snew = (char *)realloc(s, count*sizeof(char));
    if(!snew) {
        exit(1);
    }
    s = snew;
    s[count-1] = c;
    count++;
}

Or is this better:

char c;
int count = 0;
int size = 1;
char* s = malloc(sizeof(char));
while((c = fgetc(file)) != EOF) {
    s[count] = c;
    count++;
    if(count >= size) {
        size*=2;
        char* snew = (char *)realloc(s, size*sizeof(char));
        if(!snew) {
            exit(1);
        }
        s = snew;
    }
}
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
aabceh
  • 95
  • 7
  • 3
    Is this really a c++ question? If so, then it would be best (in general) to let std::string to handle the memory allocation for you. – Ben Jones Jul 10 '18 at 16:53
  • It depends. For your use-case, try both and meassure (and try many times to get an average, and with optimized code). – Some programmer dude Jul 10 '18 at 16:54
  • 2
    there's always trade-offs,, what's more important/constrained? Speed or memory? – yano Jul 10 '18 at 16:54
  • 2
    On another note, don't forget that [`fgetc`](http://en.cppreference.com/w/c/io/fgetc) returns an *`int`*. That's actually important for the comparison to `EOF`. – Some programmer dude Jul 10 '18 at 16:54
  • Just map the file into memory. – Maxim Egorushkin Jul 10 '18 at 16:58
  • sizeof(char) is guaranteed to be one, Just use 1 instead of `sizeof(char)`, and `count` instead of `count * sizeof(char)`. Better yet, use `char* s = malloc(sizeof(*s));` and `char* snew = realloc(s, size*sizeof(*snew));` That type of construct saves lots of headaches if you ever decide to change the type of s to something else. – FredK Jul 10 '18 at 20:48

2 Answers2

4

Memory is abundant these days, so having an exact amount of memory isn't quite as important as minimizing your run time.

The second scheme, where you double the allocated memory anytime you need more, is generally considered the better option.

dbush
  • 205,898
  • 23
  • 218
  • 273
2

The latter is absolutely better if you've got unconstrained line length, because the time spent building a string of length n will be O(n) whereas the former is O(n²) due to unnecessary copying. If you want to decrease overallocation, you can make a trade off for a higher constant factor - for example, by always increasing the buffer by say 20 % and 1 character.


P.S. fgetc returns an int - otherwise no amount of memory will be enough on a platform where char is unsigned...