4

On a previous question of mine I posted:

I have to read several very large txt files and have to either use multiple threads or a single thread to do so depending on user input. Say I have a main method that gets user input, and the user requests a single thread and wants to process 20 txt files for that thread. How would I accomplish this? Note that the below isn't my code or its setup but just what the "idea" is.

Example:

int numFiles = 20;
int threads = 1;

 String[] list = new String[20];
 for(int i = 1; i < 21; i++){
   list[i] = "hello" + i + ".txt";//so the list is a hello1.txt, hello2.txt, ...,  hello20.txt
 }

 public void run(){
 //processes txt file
 }

So in summary, how would I accomplish this with a single thread? With 20 threads?

And a user suggested using threadPools:

When the user specifies how many threads to use, you'd configure the pool appropriately, submit the set of file-read jobs, and let the pool sort out the executions. In the Java world, you'd use the Executors.newFixedThreadPool factory method, and submit each job as a Callable. Here's an article from IBM on Java thread pooling.

So now I have I have a method called sortAndMap(String x) which takes in a txt file name and does the processing, and for the example above, would have

Executors.newFixedThreadPool(numThreads);

How do I use this with threadPools so that my example above is doable?

user1261445
  • 291
  • 1
  • 6
  • 15

3 Answers3

12

Ok, bear with me on this, because I need to explain a few things.

First off, unless you have multiple disks or perhaps a single disk which is SSD, it's not recommended to use more than one thread to read from the disk. Many questions on this topic have been posted and the conclusion was the same: using multiple threads to read from a single mechanical disk will hurt performance instead of improving it.

The above happens because the disk's mechanical head needs to keep seeking the next position to read. Using multiple threads means that when each thread gets a chance to run it will direct the head to a different section of the disk, thus making it bounce between disk areas inefficiently.

The accepted solution for processing multiple files is to have a single producer (a reader thread) - multiple consumer (processing threads) system. The ideal mechanism is a thread pool in this case, with a thread acting as the producer and putting tasks in the pool queue for the workers to process.

Something like this:

int numFiles = 20;
int threads = 4;

ExecutorService exec = Executors.newFixedThreadPool(threads);

for(int i = 0; i < numFiles; i++){
    String[] fileContents = // read current file;
    exec.submit(new ThreadTask(fileContents));
}

exec.shutdown();
exec.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
...

class ThreadTask implements Runnable {

   private String[] fileContents;

   public ThreadTask(String[] fileContents) {
        this.fileContents = fileContents;
   }

   public void run(){
      //processes txt file
   }
}
Tudor
  • 61,523
  • 12
  • 102
  • 142
  • Great example. I'm going to see some people today to see if I can get mine to work (output looks fine when debugging step by step, but program is shooting blanks when I run it w/o debugger. – user1261445 May 01 '12 at 13:35
  • Everybody has SSD these days. Does that mean its better to multithread the read operation? – Shervin Asgari Jun 17 '13 at 07:05
  • @Shervin: I'll have to test with an SSD. I'm not really sure what the behavior is. – Tudor Jun 20 '13 at 07:07
1

I would start by reading this tutorial on high level concurrency. I recommend reading the whole concurrency tutorial because it sounds like you are new to multithreading.

eabraham
  • 4,094
  • 1
  • 23
  • 29
1

So, the newFixedThreadPool() call will return an instance of ExecutorService. You can reference the JavaDoc, which is pretty comprehensive and contains a workable example. You will want to either submit or invokeAll a number of Callables implementing your file-processing tasks, giving you a number of Futures in return. Their get() methods will give you the result of the task execution upon completion (you have to write that part yourself :))

Alexander Pavlov
  • 31,598
  • 5
  • 67
  • 93