Java multi-threading for text file processing

Question

I have a java program that reads and iterates through each text file in a directory, makes a word index (word: which pages it appears on), and prints the output for each file into an output directory. I would like to convert this to a program that utilizes multi-threading for each file (start a new thread for each file). I am pretty new to Java and completely new to multithreading in Java. The input is: java Index inputFolder outputFolder pageLength

Here is my working code without multi-threading:

import java.io.File;
import java.io.IOException;
import java.util.Map;
import java.util.Scanner;
import java.util.TreeMap;
import java.io.PrintStream;

public class Index {
      public static void main(String[] args) {
        long startTime = System.nanoTime();
        PrintStream stdout = System.out;
        try {
            File folder = new File(args[0]);
            File[] files = folder.listFiles();
            for (File file : files) {
              String name = file.getName();
              int pos = name.lastIndexOf(".");
              if (pos > 0) {
                  name = name.substring(0, pos);
              }
              Scanner sc;
              sc = new Scanner(file);
              Map<String, String> wordCount = new TreeMap<String, String>();
              int count = 0;
              while(sc.hasNext()) {
                  String word = sc.next();
                  word = word.trim().toLowerCase();
                  int len = word.length(); 
                  count = (int) count + len;
                  int pageNumber = (int) Math.ceil(count / Float.valueOf(args[2]));
                  if(!wordCount.containsKey(word))
                      wordCount.put(word, Integer.toString(pageNumber));
                  else
                      wordCount.put(word, wordCount.get(word) + ", " + Integer.toString(pageNumber));
              }

              // show results
              sc.close();
              PrintStream outputFile = new PrintStream(args[1]+"/"+name+"_output.txt");
              System.setOut(outputFile);
              for(String word : wordCount.keySet())
                  System.out.println(word + " " + wordCount.get(word));
            }
        }
        catch(IOException e) {
            System.out.println("Unable to read from file.");
        }
      long endTime   = System.nanoTime();
      long totalTime = endTime - startTime;
      System.setOut(stdout);
      System.out.println(totalTime / 1000000);
    }
}

To reiterate, I would like to adapt this so that each file iteration starts a new thread.

So in your research into how to code with multiple threads, you couldn't find a single example of how to start threads or use thread pools? --- http://idownvotedbecau.se/noresearch/ — Andreas, Nov 19 '18 at 17:20
I did, but they are all too simple (print name of thread, etc... ) for me to figure out how to apply multithreading to the problem at hand. I know I need a class that implements Runnable, and a public void run() for the processing, but I'm stumped how to connect it all together so that it works in this context. — Daniel Gizzi, Nov 19 '18 at 17:28

score 2 · Accepted Answer · answered Nov 19 '18 at 17:28

2

If you're using Java 1.8+ you could use the streams API.

.parallelStream() will execute the tasks in parallel, assigning a thread to each task.

You'll need a List to invoke the streams API

List<File> files = new ArrayList<>(); //initialization

//populate list here

files.parallelStream()
     .forEach(x->{
       //logic goes here
      });

Example Repl.it

Documentation about paralellism

answered Nov 19 '18 at 17:28

Cheloide

793
6
21

thanks, this is much simpler than what I was originally trying to do – Daniel Gizzi Nov 19 '18 at 18:27

Java multi-threading for text file processing

1 Answers1