5

I want to read and parse a lot of files. Since there are over 10000 files that are to be parsed, I want to make this process faster by making use of threads.

For example, if I had 5 threads, I want to have them all read a certain number of files concurrently so that the process of reading and parsing is faster. Is this possible? Would I gain any significant speed-up by splitting this up into threads? If so, how can I do this?

P.S. I am not against using external libraries.

I am working with jdk 1.6

smac89
  • 39,374
  • 15
  • 132
  • 179
Mimanshu
  • 173
  • 1
  • 2
  • 14
  • "all threads to read one file simultaneously. so that each thread ended up parsing 2 files" What does this mean? Do you want two threads to read the same file or one thread to read two files? – Code-Apprentice Sep 07 '14 at 20:18
  • @Code-Apprentice Description Updated. It means that you have got 10 files and 5 threads with you. Now u want each thread to parse 1 separate file simultaneously.I hope I am clear – Mimanshu Sep 07 '14 at 22:31
  • 1
    Don't overdo the number of threads. Remember that the hard disk isn't multithreaded. – user207421 Sep 07 '14 at 23:06

2 Answers2

1

If you have many files to read, the better approach is to have no more than one thread read each file. And the best way of handling many tasks with multiple threads , for most cases, is to use an ExecutorService that uses a thread pool. Submit a task to the service for each file to be read. Make the thread pool large enough to keep the I/O system busy (which is likely to be the bottleneck) and you will maximize performance.

Raedwald
  • 46,613
  • 43
  • 151
  • 237
0

See How to read all lines of a file in parallel in Java 8 for reading one file in parallel.

In your case, I'd just launch a pool of threads with as many threads as your process will allow, each with a "read the whole file" request for a file assigned to it, and let the OS decide which files to read in which order.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341