Any Downsides to this Method of String Retrieval?

Question

I saw a function on this site a while ago, that I took and adapted a bit for my use.

It's a function that uses getc and stdin to retrieve a string and allocate precisely as much memory as it needs to contain the string. It then just returns a pointer to the allocated memory which is filled with said string.

My question is are there any downsides (besides having to manually free the allocated memory later) to this function? What would you do to improve it?

char *getstr(void)
{
    char *str = NULL, *tmp = NULL;
    int ch = -1, sz = 0, pt = 0;

    while(ch)
    {
        ch = getc(stdin);
        if (ch == EOF || ch == 0x0A || ch == 0x0D) ch = 0;
        if (sz <= pt)
        {
            sz++; 
            tmp = realloc(str, sz * sizeof(char));
            if(!tmp) return NULL;
            str = tmp;
        }
        str[pt++] = ch;
    }

    return str;
}

After using your suggestions here is my updated code, I decided to just use 256 bytes for the buffer since this function is being used for user input.

char *getstr(void)
{
    char *str, *tmp = NULL;
    int ch = -1, bff = 256, pt = 0;

    str = malloc(bff);
        if(!str) 
        {
            printf(\nError! Memory allocation failed!");
            return 0x00;
        }
    while(ch)
    {
        ch = getc(stdin);
        if (ch == EOF || ch == '\n' || ch == '\r') ch = 0;
        if (bff <= pt)
        {
            bff += 256; 
            tmp = realloc(str, bff);
            if(!tmp) 
            {
                free(str);
                printf("\nError! Memory allocation failed!");
                return 0x00;
            }
            str = tmp;
        }
        str[pt++] = ch;
    }
    tmp = realloc(str, pt);
    if(!tmp)
    {
        free(str);
        printf("\nError! Memory allocation failed!");
        return 0x00;
    }
    str = tmp;

    return str;
}

Reminds me of this one: http://stackoverflow.com/a/8164021/714501 — cnicutar, Aug 22 '12 at 14:55
Yep! That's exactly where I saw this function the first time. — Keith Miller, Aug 22 '12 at 14:59
Your revised version is much more sensible. You might also want to consider multiplying `bff` by 2 for each realloc(), see the discussion between Qnan and me in his answer's comments. Altho if as you say this is just for manual user input, what you have is fine. — CodeClown42, Aug 22 '12 at 15:37

score 2 · Answer 1 · answered Aug 22 '12 at 14:48

2

Yes, the main problem is that realloc is pretty slow and calling it repeatedly for each character is generally a bad idea.

Try allocating a fixed amount of memory to start with, say N=100 characters and when you need more, get something like 2*N, then 4*N and so on. You'll overspend only up to twice the memory but save a lot in running time.

answered Aug 22 '12 at 14:48

Qnan

3,714
18
15

Better Idea: Allocate a large buffer, then if you need to overrun it, use realloc to double your space. If you really want to trim the extra storage space, you can malloc, copy, and free, or even realloc at the end. – FrankieTheKneeMan Aug 22 '12 at 14:49
When you say main problem are you implying that there are others? What would you do differently? – Keith Miller Aug 22 '12 at 14:49
I thought about allocating a large memory chunk at first, but for some reason I find the idea of allocating more memory than needed to be somewhat unsettling but it is very true that calling realloc once at the end would be more efficient than calling multiple instances of realloc. – Keith Miller Aug 22 '12 at 14:54
@KeithMiller Apart from that, wouldn't it return NULL between the lines when reading a CR+LF style document? – Qnan Aug 22 '12 at 14:56
I use this function for user input not reading documents. So I don't think that would matter? – Keith Miller Aug 22 '12 at 14:57
@KeithMiller if most lines are below some fixed length, you could also create a temporary buffer and read into. Then you will only have to resort to realloc when the length of the line exceeds the given limit. – Qnan Aug 22 '12 at 14:57
Even 100 bytes is too little. Use KB sized steps, or if you want the size of the return string to be precise, see my answer below. – CodeClown42 Aug 22 '12 at 14:58
@goldilocks that's user input, as mentioned above. I don't expect most users to type a thousand characters every time. And this is beside the point, really. – Qnan Aug 22 '12 at 15:00
@Qnan: stdin isn't necessarily user input; if it is piped from another process, you could have any amount read in. Methinks mem usage below 1KB is not significant enough to be concerned about. – CodeClown42 Aug 22 '12 at 15:02
@goldilocks see the comment of the OP above, "I use this function for user input not reading documents" – Qnan Aug 22 '12 at 15:03
@goldilocks and if it wasn't user input and one did expect large amounts of data, than doubling the array size would be more efficient then adding KB by KB – Qnan Aug 22 '12 at 15:05
@Qnan: there are definitely a lot of different approaches to take. Higher level languages with intrinsic memory allocation and management (perl, python, etc.) I believe do what you are suggesting as a general purpose, performance oriented solution: when you initialize a variable you get (eg) 1KB, if you use that you get 2, if you use that you get 4, then 8 and so on. – CodeClown42 Aug 22 '12 at 15:32

score 2 · Answer 2 · answered Aug 22 '12 at 14:50

It depends on '\n'=='0xa' and '\r' =='\0d' for no good reason. If you mean \r and \n, use them.
It may be unreasonably slow, reallocating for every character you read.
sizeof(char) is guaranteed to be 1, so it's pointless.
If you've allocated a block of memory, then realloc fails, you're returning NULL without returning or freeing str, thus leaking the memory.
The interface provides no way to indicate partial failure, as in #4. All you can do is return a string or not. Given an immense input string, you have no way to indicate that you've read part but not all of it.

Oops nice catch I didn't see that I forgot to free the value of `str` — Keith Miller, Aug 22 '12 at 14:56

unwind · Answer 3 · 2012-08-22T14:59:26.567

2

Here are the first few observations, other answers include some more:

It's growing the buffer by 1 byte at a time, thus doing needlessly many realloc() calls.
If realloc() fails, the previous buffer is lost.
It's not getline(), although it's more portable of course.
It's also not very portable to hardcode ASCII values for line feed and carriage return, use '\n' and '\r' instead.

edited Aug 22 '12 at 14:59

answered Aug 22 '12 at 14:54

unwind

391,730
64
469
606

score 2 · Accepted Answer · answered Aug 22 '12 at 14:56

It's excessively frugal IMO, and makes the mistake of sacrificing performance in order to save infinitesmal amounts of memory, which is pointless in most settings, I think. Allocation calls like realloc are potentially laborous for the system, and here it is done for every byte.

It would be better to just have a local buffer, say 4KB, to read into, then allocate the return string based on the length of what is actually read into that. Keep in mind that the stack* on a normal system is 4-8MB anyway, whether you use it all or not. If the string read turns out to be longer than 4KB, you could write a similar loop that allocates and copies into the return string. So a similar idea, but heap allocation would occur every 4096 bytes rather than every byte, so, eg, you have the initial buffer of 4096, when that is exhausted you malloc 4096 for the return string and copy in, continue reading into the buffer (from the beginning), and if another 1000 bytes is read you realloc to 5097 and return that.

I think it is a common mistake of beginners to get obsessed with minimizing heap allocation by approaching it byte by byte. Even KB by KB is a little small; the system allocates in pages (4 KB) and you might as well align yourself with that.

*the memory provided for local storage inside a function.

On most systems, the stack has a few megabytes of *address space* allocated, but the actual memory allocated will be the current size, rounded up to an integer number of pages (e.g., 4KB blocks). — Jerry Coffin, Aug 22 '12 at 15:05
@JerryCoffin: Absolutely, the OP should read up on the difference between virtual address space vs. real memory. I think not all OS's handle this the same way, however: some of them take virtual address space as a concrete promise, so if the OS gives a process 4MB of address space, it also sets aside 4MB of real memory to back that promise up (though none of it is allocated in the sense of a virtual <-> real mapping). Others do not (which is why by default, malloc or realloc will never return null on linux). Etc. Just trying to bring out some of the basic principles ;) — CodeClown42, Aug 22 '12 at 15:27

Any Downsides to this Method of String Retrieval?

4 Answers4