0

I have a super large file to read (many terabytes). I can easily read the file using one thread but I noticed that it takes a long time for the program to read the file line by line. I was curious, is there any benefits from reading a large file in chunks using multiple threads? I'm thinking that threads will actually slow things down because I'm technically not doing any computations while I'm reading the file so its not like I can use a chunk of the file for something.

Would using a single thread be faster in this case vs multi-threads?

Y_Y
  • 1,259
  • 5
  • 26
  • 43
  • 1
    Threads can make code faster because modern machines have more than one CPU core. That file however is still on a single logical disk drive with a single disk controller. Using threads will actually make reading the file substantially slower, especially so on a spindle drive. They do *not* like to be jerked around, forced by threads making it jump between parts of the file. Disk seeks are by far the slowest feature of a drive. The only reasonable thing you can do is never wait for the program to complete, a watched pot never boils. – Hans Passant Aug 28 '16 at 13:59
  • @HansPassant *That file however is still on a single logical disk drive with a single disk controller.* There are filesystems that support striping files across multiple LUNs. [IBM's GPFS](https://en.wikipedia.org/wiki/IBM_General_Parallel_File_System) and [Oracle's QFS](https://en.wikipedia.org/wiki/QFS) are two examples. To get maximum performance from such filesystems may *require* the use of multiple threads. I've worked on such filesystems that were so fast they could deliver data faster than the OS could map pages into virtual memory. Processes had to `memset` buffers before reading. – Andrew Henle Aug 28 '16 at 14:53

1 Answers1

1

When reading data from a file your limiting factor will be the read speed of the harddisk - not the CPU.

Reading data from a file is fastest if you access the file sequentially.

Daniel
  • 10,641
  • 12
  • 47
  • 85