1

I have the following C code:

char* str = (char*)malloc(sizeof(char));
int count = 0;

while ((c = getchar()) != EOF){
    str[count] = c;
    count++;
    str = (char*)realloc(str, sizeof(str) + sizeof(char));
}

But it is throwing the error Unhandled exception at 0x77C8F94D (ntdll.dll) in algorithms.exe: 0xC0000374: A heap has been corrupted. I have been trying to solve this for ages, but can't get it right. Interestingly, it is only an issue when the input stream has a large number of characters to be read.

Is this due to my usage of malloc and realloc?

ggorlen
  • 44,755
  • 7
  • 76
  • 106
Harry Stuart
  • 1,781
  • 2
  • 24
  • 39
  • 6
    You don't mean `sizeof(str)` - that's the size of the pointer, not the length of the string. You probably meant `count`. – Rup Sep 22 '19 at 11:36
  • You probably mean strlen(str) – Irelia Sep 22 '19 at 11:37
  • Given `char* str`, just what do you think `sizeof(str)` returns? – Andrew Henle Sep 22 '19 at 11:38
  • 2
    @Nina - Can't use `strlen(str)` because there's no `nul` terminator! – Adrian Mole Sep 22 '19 at 11:42
  • 1
    Besides not using `sizeof` in the `realloc` call, you also want to multiply there, not add. – Steve Summit Sep 22 '19 at 11:42
  • Oh yes good call @Adrian – Irelia Sep 22 '19 at 11:49
  • 1
    Probably want "`count * sizeof(char)`"; but really, calling `realloc()` for every single character is awful and you should allocate more less often (e.g. like maybe "`if(count % 4096 == 0) { str = realloc(str, (count + 4096) * sizeof(char)); }`". Also note that `realloc()` could return NULL and you should check for and handle that properly instead of crashing unexpectedly. – Brendan Sep 22 '19 at 13:54
  • 1
    @Brendan I agree that calling `realloc` on every single character is awful, but I've heard that it's not as bad as it could be because the OS will give larger blocks of memory than requested, resulting in system calls being automatically amortized and we're mostly left with the in-process function call overhead to `realloc`. Is this true, and if so, to what extent does it improve the situation? In other words, how awful is it? – ggorlen Sep 22 '19 at 17:53
  • 2
    @ggorlen: How expensive each individual `realloc()` is depends on the code responsible (e.g. the implementation of the C system library). For my own personal implementation its fast for some cases (decreasing size) but for increasing the size (beyond the next 32 byte boundary) it falls back to "allocate new block, copy every byte into the new block, free the old block", so as the block gets larger it gets slower and slower. – Brendan Sep 22 '19 at 18:50

1 Answers1

3

Corrected code below:

char* str = (char*)malloc(sizeof(char));
int count = 0;

while ((c = getchar()) != EOF){
    str[count] = c;
    count++;
    str = (char*)realloc(str, count * sizeof(char));
}

However, this does use too many realloc calls! Better to only allocate in blocks, where I've also used the BPC principle, rather than MNC (see comments):

size_t limit = 1024u; // For example!
char* str = malloc(limit);
int count = 0;
while ((c = getchar()) != EOF) {
    str[count] = c;
    if (++count >= limit) {
        limit += 1024u;
        char *str2 = realloc(str, limit); // Shouldn't use same pointer ...
        if (!str2) { // ... failure!
            // <Some error message/flag>
            break;
        }
        else str = str2; // Successful realloc
    }
}
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • 1
    And it's often better still to use a multiplicative growth scheme, e.g. `limit *= 2`. – Steve Summit Sep 22 '19 at 13:23
  • @SteveSummit Yes, in many circumstances multiplicative growth is good; here, I'm not so sure, and I wanted to keep to code simple. – Adrian Mole Sep 22 '19 at 13:25
  • Even better would be just ```str = (char*)realloc(str, limit);```, since ```sizeof(char) == 1```. See https://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1-or-at-least-char-bit-8 – gstukelj Sep 22 '19 at 13:26
  • @GasperStukelj Of course, but sometimes putting the explicit `sizeof(type)` in calls makes it clearer to readers what exactly you're allocating. And any (half-)decent compiler would just drop the `* 1`, anyway. – Adrian Mole Sep 22 '19 at 13:29
  • 2
    [Do I cast the result of `malloc`?](https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc). If you do need to use `sizeof(type)`, it's better to use `sizeof(*str)` which doesn't hardcode the type. – ggorlen Sep 22 '19 at 15:13
  • 1
    Which modern C compilers warn if it's not used? That's surprising. – ggorlen Sep 22 '19 at 15:17
  • 1
    You're entitled to that, but if the OP's code has a variety of poor practices, I don't think MNC makes much sense. Why not simply produce the best possible code in any given circumstance (BPC Principle)? – ggorlen Sep 22 '19 at 15:31
  • 1
    Can do "`if(count == limit) { ..`" before the `str[count] = c;` to avoid the chance of (re)allocating memory just before EOF, and also avoid the need for the `malloc()` (just do `char *str = NULL;` and let `realloc()` allocate). – Brendan Sep 22 '19 at 15:53
  • @Brendan On your first point: we don't know what we'll be doing with `str` after we've finished the input loop! If we then append (say) a `nul` character, your suggested optimization would fail. – Adrian Mole Sep 22 '19 at 15:57
  • 2
    @Adrian: LOL. If you're worried about failures, start by checking to see if `realloc()` returned NULL, and maybe think about limiting the max. size (in case some chuckleberry tries to read "`/dev/zero`"). – Brendan Sep 22 '19 at 16:00
  • Agree with Brendan on checking that `malloc`/`realloc` worked. Seems `limit *= 2` is more typical but yes, overflowing `limit` is a risk. I'd definitely remove `sizeof(char)`, it's guaranteed to be 1 and reduces readability (or use `sizeof *str`). But I don't mean to nitpick this small piece of code excessively, it answers OP's question nicely. – ggorlen Sep 22 '19 at 17:36
  • 1
    @ggorlen - OK - I surrender! – Adrian Mole Sep 22 '19 at 17:47