To read (and store) an unknown number of characters from stdin
, you have two basic options:
- if available, use POSIX
getline()
to read all characters into a buffer (getline
will reallocate memory as required to store the entire line of input, including the *nul-terminating character), or
- if you need to write truly portable code, handle the initial memory allocation, and reallocation as needed, using
malloc
and realloc
.
As you have found, dynamic memory allocation can be a bit daunting at first, but there really isn't any reason it should be. In any case, you simply allocate some initially sized block of memory, assign the starting address to a pointer, store whatever you need in the block keeping track of the memory used. When then memory used equals the memory available (i.e. when you fill up the block of memory you allocated), you simply reallocate more memory using a temporary pointer, validate that your call to realloc
succeeded, and then assign the start of the reallocated block of memory to your original pointer, and keep going and repeat the process each time your fill up your block of memory.
There are many ways to approach the read, using a character-oriented-input function like getchar()
, or using some fixed size buffer and fgets
to read a fixed number of characters at a time. It's really up to you. Avoid scanf
for a single character, there is no need for it. The underlying read is buffered by the filesystem, so there isn't a performance penalty regardless which you choose. (Linux provides an 8192-byte read buffer sized with IO_BUFSIZ
(now BUFSIZ
now, see glibc/libio/stdio.h - #define BUFSIZ 8192 and _IO_BUFSIZ
changed to BUFSIZ
glibc commit 9964a14579e5eef9) and windows proves a similar 512-byte buffer)
The key is to take it step-by-step, validate every allocation, and handle the error as required. You use a temporary pointer with realloc
because if realloc
fails, it returns NULL
and if you were assigning the return of realloc
to your original pointer, you would overwrite the address to your original block of memory with NULL
creating a memory-leak. By using a temp pointer, if realloc
fails, your existing data is still accessible through your original pointer.
For example to double the size of a currently allocated buffer
with current allocation size of buffersize
, you could naively do:
buffer = realloc (buffer, 2 * buffersize); /* wrong - potential memory leak */
if (buffer == NULL) { /* validate reallocation */
perror ("realloc-buffer"); /* output error message */
/* handle error */
}
buffersize *= 2; /* increment buffersize */
Instead, you will do:
void *tmp = realloc (buffer, 2 * buffersize); /* use a temporary pointer */
if (tmp == NULL) { /* validate reallocation */
perror ("realloc-buffer"); /* output error message */
/* handle error, buffer still points to original block */
}
buf = tmp;
buffersize *= 2;
The way to digest how it works is through a minimal simple example. The following will read a line of unknown size from stdin
using the portable getchar()
, malloc
and realloc
making use of a reallocation scheme that simple doubles the size of the buffer each time you fill your buffer. (you are free to increment by any additional amount you like, but avoid reallocating for every character read -- that would be inefficient, doubling the buffer size or some similar increase minimizes the number of times you reallocate)
#include <stdio.h>
#include <stdlib.h>
#define NCHR 8 /* initial number of characters to allocate */
int main (void) {
int c; /* char to read from stdin */
size_t ndx = 0, /* index/count of characters */
nchr = NCHR; /* number of characters allocated in buf */
char *buf = malloc (nchr); /* buffer allocated for nchr chars */
if (buf == NULL) { /* validate that allocation succeeds */
perror ("malloc-buf"); /* otherwise handle error */
return 1; /* bail */
}
/* read chars from stdin until '\n' or EOF */
while ((c = getchar()) != '\n' && c != EOF) {
if (ndx == nchr - 1) { /* check if reallocation is needed */
void *tmp = realloc (buf, 2 * nchr); /* double buf size */
if (tmp == NULL) { /* validate realloc succeeds */
perror ("realloc-buf"); /* handle error */
break; /* break don't bail, buf holds chars read */
}
buf = tmp; /* assign newly sized block of mem to buf */
nchr *= 2; /* update nchr to new allocation size */
}
buf[ndx++] = c; /* assign char to buf, increment index */
}
buf[ndx] = 0; /* nul-terminate buffer */
if (c == EOF) /* if read stopped on EOF */
putchar ('\n'); /* tidy up outputting \n */
printf ("length : %zu\ncontent: %s\n", ndx, buf);
free (buf); /* don't forget to free what you allocate */
}
(note: the check for EOF
which will be generated by Ctrl + d (or Ctrl + z on windows) and the output of an additional '\n'
when encountered, otherwise your next output will begin at the end of your current input. Also note nchr - 1
in if (ndx == nchr - 1)
ensure there is always 1-character available for storing the nul-terminating after the loop exits.)
Example Use/Output
$ ./bin/getchar_dyn
1234 5678 9012 3456 7890
length : 24
content: 1234 5678 9012 3456 7890
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/getchar_dyn
==28053== Memcheck, a memory error detector
==28053== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28053== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==28053== Command: ./bin/getchar_dyn
==28053==
1234 5678 9012 3456 7890
length : 24
content: 1234 5678 9012 3456 7890
==28053==
==28053== HEAP SUMMARY:
==28053== in use at exit: 0 bytes in 0 blocks
==28053== total heap usage: 3 allocs, 3 frees, 56 bytes allocated
==28053==
==28053== All heap blocks were freed -- no leaks are possible
==28053==
==28053== For counts of detected and suppressed errors, rerun with: -v
==28053== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.