getline()
is a POSIX.1 function, which reads lines into dynamically allocated buffers, allowing lines of any length (limited only by the amount of available memory). It returns the number of characters read, or -1 if there is no more input, or an error occurred.
Here is one example use pattern:
char *line = NULL;
size_t size = 0;
ssize_t len;
while (1) {
len = getline(&line, &size, stdin);
if (len < 1)
break;
/* Do something with the input line */
}
At any point, you can release the dynamically allocated buffer using
free(line);
line = NULL;
size = 0;
The reason you want to clear the pointer to NULL and size to zero is that that way, you do not accidentally try to access the already freed memory, but you can call getline(&line, &size, handle)
to read more lines as the call will simply recognize it does not have a buffer, and will allocate a new one.
You can manipulate the dynamic data in any way you wish, if you are careful. For example:
while (1) {
char *line = NULL;
size_t size = 0;
ssize_t len;
len = getline(&line, &size, stdin);
if (len < 1) {
free(line);
break;
}
/* Do something with the contents of the line */
free(line);
}
will work, but it will be quite slow, because the C library will do at least one malloc()
call for every line read, and possibly additional realloc()
calls, depending on the line length.
The reason getline()
is written as it is, is that it allows reusing the same buffer for any number of lines. If you read files sequentially, you can reuse the same buffer. Let's look at a more complex example:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
#include <errno.h>
int main(int argc, char *argv[])
{
unsigned long linenum;
char *line = NULL, *in, *out, *end;
size_t size = 0, n;
ssize_t len;
FILE *src;
int arg;
if (!setlocale(LC_ALL, ""))
fprintf(stderr, "Warning: Your C library does not support your current locale.\n");
if (argc < 2) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s FILENAME [ FILENAME ... ]\n", argv[0]);
fprintf(stderr, "\n");
exit(EXIT_FAILURE);
}
for (arg = 1; arg < argc; arg++) {
src = fopen(argv[arg], "r");
if (!src) {
fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
free(line);
exit(EXIT_FAILURE);
}
linenum = 0;
while (1) {
len = getline(&line, &size, src);
if (len < 1)
break;
linenum++;
/* First character in the line read: */
in = line;
out = line;
/* Pointer to the end-of-string character on the line: */
end = line + len;
/* Skip all leading whitespace characters. */
while (in < end && isspace((unsigned char)(*in)))
in++;
/* Character copy loop. */
while (in < end)
if (isspace((unsigned char)(*in))) {
/* Replace consecutive whitespace characters with spaces. */
*(out++) = ' ';
do {
in++;
} while (in < end && isspace((unsigned char)(*in)));
} else {
/* Copy all other characters as-is. */
*(out++) = *(in++);
}
/* There may be a single space before out. Backtrack it, if so. */
if (out > line && out[-1] == ' ')
out--;
/* Mark the end of the string at out. */
*out = '\0';
/* Calculate the new length, just for output purposes. */
n = (size_t)(out - line);
/* Print the line. */
printf("%s: Line %lu: '%s' (%zu of %zd characters)\n",
argv[arg], linenum, line, n, len);
}
if (!feof(src) || ferror(src)) {
fprintf(stderr, "%s: Read error.\n", argv[arg]);
fclose(src);
free(line);
exit(EXIT_FAILURE);
}
if (fclose(src)) {
fprintf(stderr, "%s: Error closing file: %s.\n",
argv[arg], strerror(errno));
free(line);
exit(EXIT_FAILURE);
}
}
free(line);
line = NULL;
size = 0;
return EXIT_SUCCESS;
}
If we save the above as say example.c, and compile it using e.g. gcc -Wall -O2 example.c -o example
, and run the program supplying it with the names of text files as parameters, for example ./example example.c
, it will output something like
example.c: Line 1: '#define _POSIX_C_SOURCE 200809L' (31 of 33 characters)
example.c: Line 2: '#include <stdlib.h>' (19 of 20 characters)
example.c: Line 3: '#include <locale.h>' (19 of 20 characters)
example.c: Line 4: '#include <string.h>' (19 of 20 characters)
example.c: Line 5: '#include <stdio.h>' (18 of 19 characters)
example.c: Line 6: '#include <ctype.h>' (18 of 19 characters)
example.c: Line 7: '#include <errno.h>' (18 of 19 characters)
example.c: Line 8: '' (0 of 1 characters)
example.c: Line 9: 'int main(int argc, char *argv[])' (32 of 33 characters)
What the program does, is simply read each specified file line by line, remove any leading and trailing whitespace on each line, and combine all consecutive whitespace into a single space. The smaller character count is the number of characters left (and shown), the larger number is the original number of chars read from the file.
Additional notes on the example program, if it happens to interest you
The setlocale(LC_ALL, "")
call tells your C library to use the users locale (usually defined in the LANG
or LC_ALL
environment variables). This program uses only the character type definitions for the character set used by the current locale (to determine which codes are "whitespace"), so this could also be limited to that, via setlocale(LC_CTYPE, "")
. The call will return NULL if the current locale is not supported by the C library. Usually that is because of an error in the user configuration, so it is useful to have the program warn then.
The isspace()
(and all other is*()
functions defined in <ctype.h>
) take an unsigned character code (or EOF). Because char
type can be signed or unsigned, we explicitly cast the character to (unsigned char)
before supplying to the function. Consider this silly historical baggage that we just have to deal with this way.
Because line
points to the beginning of the dynamically allocated memory buffer, we must not modify it (except via realloc()
, or free()
and then set to NULL
). If we do modify it, any subsequent getline()
or free()
call using that pointer will likely freak out, and crash the program, since they really need the pointer to point to the beginning of the buffer, not just somewhere inside it.
I like using pointers (char *in, *out, *end
) instead of indexes. Here, in
starts at line
, and goes up to but not including line+len
, which is where getline()
put the end-of-string nul \0
to indicate the end of the line. That's why I also often use a pointer named end
to point to that. The out
starts at line
also, but only increases when characters are kept in a line.
If you think about a row of lettered tiles, like in scrabble, out
points to the next position you'll put a kept tile, and in
points to the next tile you get.
When getline()
or getdelim()
returns a zero or a negative value (or fgets()
returns NULL), it means that either there was no more data to read, or the operation failed for some other reason.
After the loop, (!feof(src) || ferror(src))
checks if the input stream was read completely without errors. Well, rather, the inverse: the expression is true only if an error occurred, or the entire file was not read.
If I had written data to some file, say FILE *dst
, I typically precede this test with if (fflush(dst))
test. It is true if there is an error writing the last of the data buffered by the C library to the file.
The fclose(src)
closes the file. I personally prefer to verify its return value, because even though currently it can only fail in very specific circumstances, I as an user would definitely prefer to know if the OS had issues writing my data! The test costs basically nothing, but may be crucial for the user. I do not want any programs to "forget" telling me there was a problem, when working on my data; my data is important to me.
free(NULL)
is safe, and does nothing. (Also, realloc(NULL, size)
is equivalent to malloc(size)
, so if you initialize a pointer to NULL, you don't need an initial malloc, you can just realloc()
it always to the size you need.)
I suggest you play with the above code. You can even run it under ltrace (ltrace ./example example.c
) to see which standard C library calls are actually performed, and their results; or under strace (strace ./example example.c
) to see the syscalls (from the process to the OS kernel proper).
As an example, you could add say
if (linenum == 7) {
/* We skip line 7, and even destroy it! Muahhahah! */
free(line);
line = NULL;
size = 0;
continue;
}
just after the linenum++
line, to see what happens to the seventh lines of the text files. (They're skipped, and even if the buffer is released, nothing bad happens (because continue
starts the next iteration of the while loop body), as the next getline()
will just dynamically allocate a new line.
If you decide you want to keep a copy of part of the line, just calculate the length you need adding one for the end-of-string nul (\0
), allocate that many chars for the duplicate (sizeof (char) == 1
in C, always; so malloc() et al. take the number of chars to allocate for, really), memcpy()
the data, and add the terminating nul. For example,
char *sdup(const char *const source, const size_t length)
{
char *s;
s = malloc(length + 1);
if (!s) {
/* Either return NULL, or: */
fprintf(stderr, "sdup(): Not enough memory for %zu chars.\n", length + 1);
exit(EXIT_FAILURE);
}
if (length > 0)
memcpy(s, source, length);
s[length] = '\0';
return s;
}
If you want the full string (up to end of string nul), you can use the POSIX.1-2008 strdup()
instead.