0

I need to make read lines from a file but i'm do not know how long a line would be. So far the only thing i could think of was to use fgetc and realloc

FILE* cFile = fopen(filename, "r");
....
//some while cycle for going from line to line
....
//now for reading the line itself
char* line = malloc(sizeof(char)); //one empty spot for the '\n'
unsigned int = 0;
char c = getc(cFile);
while (c != '\n') {
    line[i] = c;
    line = realloc(line, (i+2)*(sizeof(char));
    i++;
    c = getc(cfile);
}
line[i] = c;

I omnited all the checks for EOL or whether i really got the allocated memory, this is just an example.

My question is, is there any more efficient method of getting a line of unknown length ?

Zerg Overmind
  • 955
  • 2
  • 14
  • 28
  • This `line[0] = c;` should be `line[i] = c;`. – alk Apr 21 '17 at 14:39
  • And the final `line[i] = c;` should be `line[i] = '\0';` – alk Apr 21 '17 at 14:40
  • "getting a line of unknown length" allows a hacker to overwhelm memory resources of a computer. Defensive programming would insure a line of input does not exceed some _sane_ size. So use a big buffer and if that is not enough, declare an error . – chux - Reinstate Monica Apr 21 '17 at 14:41
  • Using huge buffer sounds quite memory inneficient to me. What i mean is whether there is a function for example which is going to do this for me. And alk. I don't wanna be rude but none of what you said is any helpful to me, as well as the last thing you said is completly meaningless. The last value of c is the newline character so it does the exactly same thing. – Zerg Overmind Apr 21 '17 at 14:43
  • Did you see this [answer](http://stackoverflow.com/a/2539777)? – Neil Stoker Apr 21 '17 at 14:43
  • 1
    Many systems employ a [_tentative_ allocation](http://stackoverflow.com/q/19991623/2410359), so allocating a `buf = malloc(1024*1204);` is not _memory inefficient_. Real memory is allocated when it is used, not necessarily when `*alloc()` is called. – chux - Reinstate Monica Apr 21 '17 at 14:47
  • This seems somewhat inefficient to me. I would have to allocate space for the longest line, which could mean huge amounts of empty space, as well as somewhat inefficient complexity as i not only would have to go through the file twice, but potentionaly even load it twice. – Zerg Overmind Apr 21 '17 at 14:50
  • 1
    @ZergOvermind: '\0' is not a newline character. – Thomas Padron-McCarthy Apr 21 '17 at 14:50
  • The last character in a file may not be a `'\n`' "Whether the last line requires a terminating new-line character is implementation-defined." C11 §7.21.2. `char c = getc(cFile); while (c != '\n') {` may then lead to an infinite loop. Better to use `int c` and test for `'\n'` and `EOF`. – chux - Reinstate Monica Apr 21 '17 at 14:51
  • chux, that's actually a good thing to know, thank you, but i can't use that, since i need this in a project and all that will be considered is my own code. Can't really bet on that. – Zerg Overmind Apr 21 '17 at 14:52
  • Yeah i forgot about the null terminator, my bad. And chux, if you read what is written under my code you can see that i said that i omnitted all the checs for end of file and allocation. THIS IS JUST AN EXAMPLE. Sorry for caps but half of you ignores my real question and nitpick at a code that is clearly just an example of what i wanna do. – Zerg Overmind Apr 21 '17 at 14:53
  • Can you assume the file read does not contain _null characters_? If so, `fgets()` is a good building block, other-wise do not use it. – chux - Reinstate Monica Apr 21 '17 at 14:53
  • fgets() requires me to set a specific length for the array does it not? It doesn't dynamicaly adjusts the size of the array, does it? – Zerg Overmind Apr 21 '17 at 14:56
  • "Reading and storing lines from file into array without limit" matches the function of `getiline()`. Just search the net for the source code for this non-standard library function for a very good way to meet your coding goal. – chux - Reinstate Monica Apr 21 '17 at 14:57
  • @ZergOvermind `fgets()` uses a specified length. Yet as I commented, it can be used a a building -block, not the entire answer. – chux - Reinstate Monica Apr 21 '17 at 14:59
  • @ZergOvermind Post said "I omnited all the checks for [EOL](https://en.wikipedia.org/wiki/Newline)", yet contradictorily does `while (c != '\n')` - an end-of-line test. Code does not test for [EOF](https://en.wikipedia.org/wiki/End-of-file). AFAIK, that was a contributing problem to your effort and so mentioned it for your benefit. – chux - Reinstate Monica Apr 21 '17 at 15:04

2 Answers2

1

It would probably be more efficient to increase the buffer size with more than one character at a time, for example by starting with size 80, doubling the size when the buffer is full, and if necessary shrink it at the end.

But that makes your code more complicated and therefore more error-prone, so remember the two rules of how to hand-optimize code:

  1. Don't do it.
  2. Only for experts: Don't do it yet.

That is, don't do it, because it is probably not worth the effort. You will spend maybe an extra hour "improving" your code, and unless you know that the speedup is actually needed, you probably won't notice the difference. Add to that the risk of getting the more complicated code wrong, and spending maybe hundreds of hours finding the elusive bug that in the end turns out to be memory corruption caused by this little reading function.

And, if you really know what you are doing, and need the extra speed, don't start optimizing this piece of code until you know (that is, have measured) that it actually is here that the execution time is spent.

Thomas Padron-McCarthy
  • 27,232
  • 8
  • 51
  • 75
  • I thought of doing that but i didn't wanna risk as i expected to make some form of error. So does the one-by-one solution seem efficient enough? – Zerg Overmind Apr 23 '17 at 09:50
  • @ZergOvermind: If it is efficient enough depends entirely on your application and the time constraints you have. The realloc-one-char-at-a-time solution reads several hundred thousand lines per second on a standard modern desktop computer. Do you need more than that? – Thomas Padron-McCarthy Apr 23 '17 at 18:48
1

If you're using a POSIX system, use getline(3), which does exactly what you want. Otherwise, you can find a free implementation of getline in many places, such as here or here

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226