How to correctly buffer lines using fread

Question

I want to use fread() for IO for some reason (speed and ...). i have a file with different sized lines. by using a code like:

while( !EOF ){ 
  fread(buffer,500MB,1,fileName);
  // process buffer
}

the last line may read incompletely and we have to read the last line again in next trial, so how to force fread() to continue from the beginning of the last line?

or if possible how to force fread() to read more than 500MB untill reaching a \n or another specific character?

Thanks All

Ameer.

If you want lines why aren't you using `fgets()`? And what does `fseek()` have to do with it? or `seekg()`? — user207421, Aug 07 '17 at 12:01
You probably need to just read in chunks. https://stackoverflow.com/questions/34081158/how-do-i-process-a-text-file-in-c-by-chunks-of-lines — doctorlove, Aug 07 '17 at 12:04
@EJP because i want to read a big chunk of the file into ram and then parse it. for huge file it will be faster than reading line by line using fgets(). because of less IO accesses... now i need a function to tell fread to change the position of next retrieval — ameerosein, Aug 07 '17 at 12:04
@doctorlove the link you provide do it manually, it finds the last \n in buffer and copy the remaining into the beginning of the next buffer, it is inappropriate for me, i need a way to tell fread to change the reading position, for example by finding the last \n we can calculate how many bytes we have to read again, tell fread() to go backward some bytes and read... — ameerosein, Aug 07 '17 at 12:13
I'm not clear why you think re-reading bytes you have already read is important. You could count how many bytes from the end of the last read to the \n character then fseek backwards to there. — doctorlove, Aug 07 '17 at 12:20
*for huge file it will be faster than reading line by line using fgets(). because of less IO accesses* No it won't. If you want to use `FILE *`-type functions, `fread()`/`fgets()` and other functions that use a `FILE *` are buffered. See [`setbuf()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/setbuf.html) and [`setvbuf()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/setvbuf.html) Then use `fgets()` or [`getline()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html) to read lines. — Andrew Henle, Aug 07 '17 at 12:31
@doctorlove i exactly mean using a function like what you suggest! i did not know what is fseek() but seems it is what i want! thanks and please describe more — ameerosein, Aug 07 '17 at 12:38
@AndrewHenle yes it is, you can check this post: http://lemire.me/blog/2012/06/26/which-is-fastest-read-fread-ifstream-or-mmap/ — ameerosein, Aug 07 '17 at 12:38
@ameerosein *yes it is, you can check this post: lemire.me/blog/2012/06/26/...* That's not a definitive "benchmark". He doesn't bother specifying the entire set of hardware he's running on. What disk(s)? What disk controller? What file system? Why didn't he try using direct IO with `O_DIRECT` to bypass the page cache entirely, which is often a faster way to read data sequentially when you know how to do it fast? What about asynchronous IO? Do you **really** think comparing `fread()` to `read()` to read a single `int` at a time has anything to do with reading lines from a large file? — Andrew Henle, Aug 07 '17 at 12:51
ok you're right it's not an acceptable benchmark, but can you please answer my question without changing it? how to read some lines using fread()? how to handle the last line — ameerosein, Aug 07 '17 at 12:58

score 0 · Accepted Answer · answered Aug 07 '17 at 13:02

Assuming a bufferof bytes that you have reverse found a \n character in at position pos, then you want to roll back to the length of the buffer minus this pos. Call this step.

You can use fseek to move the file pointer back by this much:

int fseek( FILE *stream, long offset, int origin );

In your case

int ret = fseek(stream, -step, SEEK_END);

This will involve re-reading part of the file, and a fair bit of jumping around - the comments have suggested alternative ways that may be quicker.

this way we will re-reading only some bytes, comparing with reading some megabytes its forgivable, that's good, thanks for answering — ameerosein, Aug 07 '17 at 13:27

How to correctly buffer lines using fread

1 Answers1