If I were to be faced with this problem and have to solve it, I'd just use a single threaded approach, it's not worth it to put too much effort into it without speeding up the underlying medium.
Say you have this on a ramdisk, or a really fast raid, or something else, or the processing is somehow massively lopsided. Regardless of the scenario, line processing now takes the majority of the time.
I'd structure my solution something like this:
class ThreadPool; // encapsulates a set of threads
class WorkUnitPool; // encapsulates a set of threadsafe work unit queues
class ReadableFile; // an interface to a file that can be read from
ThreadPool pool;
WorkUnitPool workunits;
ReadableFile file;
pool.Attach(workunits); // bind threads to (initially empty) work unit pool
file.Open("input.file")
while (!file.IsAtEOF()) workunits.Add(ReadLineFrom(file));
pool.Wait(); // wait for all of the threads to finish processing work units
My "solution" is a generic, high level design intended to provoke thinking of what tools you have available that you can adapt to your needs. You will have to think carefully in order to use this, which is what I want.
As with any threaded operation, be very careful to design it properly, otherwise you will run into race conditions, data corruption, and all manner of pain. If you can find a thread pool/work unit library that does this for you, by all means use that.