Define a class for the buffers. Give each one a large buffer space that is some multiple of page size and a start/end index, a method that reads the buffer from a passed-in stream and a 'lineParse' method that takes another *buffer instance as a parameter.
Make some *buffers and store them on a producer-consumer pool queue. Open the file, get a buffer from the pool and read into the buffer space from start to end, (return a boolean for error/EOF). Get another *buffer from the pool and pass it into the lineparse() of earlier one. In there, search backwards from the end of the data, looking for newLine. When found, reload the end index and memcpy the fragment of the last line, (if there is one - you might occasionally be lucky:), into the new, passed *buffer and set its start index. The first buffer now has whole lines and can be queued off to the thread/s that will process the lines. The second buffer has the fragment of line copied from the first and more data can be read from disk into its buffer space at its start index.
The line-processing thread/s can recycle the 'used' *buffers back to the pool.
Keep going until EOF, (or error:).
If you can, add a method to the buffer class that does the processing of the buffer.
Using large buffer classes and parsing back from the end will be mure efficient than continually reading small bits, looking for newlines from the start. Inter-thread comms is slow and the larger the buffers you can pass, the better.
Using a pool of buffers eliminates continual new/delete and provides flow-control - if the disk read thread is faster than the processing, the pool will empty and the disk read thread will block on it until some used buffers are recycled. This prevents memory runaway.
Note that if you use more than one processing thread, the buffers may get processed 'out-of-order' - this may, or may not, matter.
You can only gain in this scenario by ensuring that the advantage of lines being processed in parallel with disk-read latencies is greater than the overhead of inter-thread comms - communicating small buffers between threads is very likely to be counter-productive.
The biggest speedup would be experienced with networked disks that are fast overall, but have large latencies.