28

I want to read in a file line by line, without knowing the line length before. Here's what I got so far:

int ch = getc(file);
int length = 0;
char buffer[4095];

while (ch != '\n' && ch != EOF) {
    ch = getc(file);
    buffer[length] = ch;
    length++;
}

printf("Line length: %d characters.", length);

char newbuffer[length + 1];

for (int i = 0; i < length; i++)
    newbuffer[i] = buffer[i];

newbuffer[length] = '\0';    // newbuffer now contains the line.

I can now figure out the line length, but only for lines that are shorter than 4095 characters, plus the two char arrays seem like an awkward way of doing the task. Is there a better way to do this (I already used fgets() but got told it wasn't the best way)?

--Ry

ryyst
  • 9,563
  • 18
  • 70
  • 97

5 Answers5

17

You can start with some suitable size of your choice and then use realloc midway if you need more space as:

int CUR_MAX = 4095;
char *buffer = (char*) malloc(sizeof(char) * CUR_MAX); // allocate buffer.
int length = 0;

while ( (ch != '\n') && (ch != EOF) ) {
    if(length ==CUR_MAX) { // time to expand ?
      CUR_MAX *= 2; // expand to double the current size of anything similar.
      buffer = realloc(buffer, CUR_MAX); // re allocate memory.
    }
    ch = getc(file); // read from stream.
    buffer[length] = ch; // stuff in buffer.
    length++;
}
.
.
free(buffer);

You'll have to check for allocation errors after calls to malloc and realloc.

Kevin Reid
  • 37,492
  • 13
  • 80
  • 108
codaddict
  • 445,704
  • 82
  • 492
  • 529
  • Just as a note, character-by-character reading is extremely slow. You should read it in big chunks (4-16k). – Blindy Mar 28 '10 at 09:45
  • 7
    @Blindy: The standard library I/O does buffering, so this isn't (much) slower than reading in chunks. – JaakkoK Mar 28 '10 at 10:12
  • 2
    doesn't resetting count to 0 cause buffer overflow? – fbstj Feb 01 '13 at 08:38
  • 1
    And as always, please [don't cast the result of `malloc()`](http://stackoverflow.com/a/605858/3233393). – Quentin Mar 10 '15 at 12:57
  • Why reset the count back to 0 after enlarge the memory size? Is the previous memory still there? – Roy Li Apr 01 '15 at 14:45
  • @Blindy, reading char-by-char using stdio package `getc(3)` or `fgetc(3)` is not a problem, while stdio does full buffering. Just check it, because you are mistaken. BTW selecting a bad buffer size by yourself (contrary to how stdio package does) could lead to worse resource allocation, and affect overal program efficiency. – Luis Colorado Oct 26 '19 at 14:00
6

You might want to look into Chuck B. Falconer's public domain ggets library. If you're on a system with glibc, you probably have a (non-standard) getline function available to you.

jamesdlin
  • 81,374
  • 13
  • 159
  • 204
  • Nice! I believe I can trust most UNIX-like systems to have glibc installed, so this is definitely a great way to read in lines. – ryyst Mar 28 '10 at 10:36
  • Moreover, `getline` has been included in the most recent POSIX standard, so it *is* standard on unix now. Still no guarantee that it is included with c *per se*, however. – dmckee --- ex-moderator kitten Jun 09 '10 at 17:32
1

You're close. Basically you want to read chunks of data and check them for \n characters. If you find one, good, you have an end of line. If you don't, you have to increase your buffer (ie allocate a new buffer twice the size of the first one and copy the data from the first one in the new one, then delete the old buffer and rename your new buffer as the old -- or just realloc if you're in C) then read some more until you do find an ending.

Once you have your ending, the text from the beginning of the buffer to the \n character is your line. Copy it to a buffer or work on it in place, up to you.

After you're ready for the next line, you can copy the "rest" of the input over the current line (basically a left shift) and fill the rest of the buffer with data from the input. You then go again until you run out of data.

This of course can be optimized, with a circular buffer for example, but this should be more than sufficient for any reasonable io-bound algorithm.

Blindy
  • 65,249
  • 10
  • 91
  • 131
1

That is how i did it for stdin, if you call it like readLine(NULL, 0) the function allocates a buffer for you with the size of 1024 and let it grow in steps of 1024. If you call the function with readLine(NULL, 10) you get a buffer with steps of 10. If you have a buffer you can supply it with it size.

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>

char *readLine(char **line, size_t *length)
{
    assert(line != NULL);
    assert(length != NULL);

    size_t count = 0;

    *length = *length > 0 ? *length : 1024;

    if (!*line)
    {
        *line = calloc(*length, sizeof(**line));
        if (!*line)
        {
            return NULL;
        }
    }
    else
    {
        memset(*line, 0, *length);
    }

    for (int ch = getc(stdin); ch != '\n' && ch != EOF; ch = getc(stdin))
    {
        if (count == *length)
        {
            *length += 2;
            *line = realloc(*line, *length);
            if (!*line)
            {
                return NULL;
            }
        }

        (*line)[count] = (char)ch;

        ++count;
    }

    return *line;
}
dunst0
  • 11
  • 3
1

Consider the scanf '%m' format conversion modifier (POSIX)

char *arr = NULL ;
    // Read unlimited string, terminated with newline. Similar to dynamic size fgets.
if ( fscanf(stdin, "%m[^\n]", &arr) == 1 ) {
   // Do something with arr
   free(arr) ;
} ;

Quoting from scanf man page:

An optional 'm' character. This is used with string conversions (%s, %c, %[), and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call). The caller should subsequently free(3) this buffer when it is no longer required

dash-o
  • 13,723
  • 1
  • 10
  • 37