0

I have made a program that counts the occurrences of a target string in a file. It is supposed to use parallelism to accomplish this but I cant seem to figure out how to write run() to only evaluate a portion of the file so that a different thread of it can evaluate the rest of the file. At least, this is my understanding of parallelism. I've been in the docs and watching videos for a couple days, and really just need someone to explain it to me; not how to step by step solve my particular problem per se, but to explain multi-threading using something more than a main method with a loop that prints the thread id. I know my class needs to implement Runnable and that run() needs to be overridden. I'm unsure about how I'm supposed to write run() to only process a part of the file when I cant pass it parameters.


    public static void main(String[] args) {
        new Thread(new Test()).start();
        new Thread(new Test()).start();
        System.out.println("My program counts: " + Test.getTotal() + " occurences of 'the'.");
    }
}
public class Test implements Runnable {

    private File alice = new File(getCurrentDir() + "/alice.txt");
    private String[] words;
    private BufferedReader reader;
    private StringBuilder sb;
    private int count;
    private static int total;

    public void run() {
        getAlice();
        for(int i = 0; i < words.length; i++) {
            if(words[i].toLowerCase().equals("the")) {
                count++;
            }
        }
        total = count;
    }
    public void getAlice() {
        try{
            reader = new BufferedReader(new FileReader(alice));
            sb = new StringBuilder();
            String line = "";
            while((line = reader.readLine()) != null) {
                sb.append(line);
            }
            words = sb.toString().split(" ");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    public String getCurrentDir() {
        String currDir = System.getProperty("user.dir");
        return currDir;
    }
    public String[] getWords() {
        return words;
    }
    static int getTotal() {
        return total;
    }
}```
sam
  • 15
  • 4
  • `Files.lines(Path.of(System.getProperty("user.dir"), "alice.txt")).parallel().mapToInt(l -> l.split(" ").count).sum();` – Johannes Kuhn Apr 17 '20 at 18:58

3 Answers3

1

Without a way to divide up the file cleanly, this would be hard to do: Arbitrarily splitting the file could split words.

If the file is broken into lines, and if lines never split words, that gives us something to work with.

One design would have a single reader thread and a pool of word counting threads.

The reader thread would obtain a stopped counting thread, obtain the read buffer from that thread, read the next line into the buffer, then resume the counting thread.

A counting thread would step through its read buffer, which would hold a single line of text, and would finish by adding the count of words on the line to the global word count total. After finishing, a counting thread would put itself back into the pool of available threads.

Whether this is a performance gain will depend on the relative time spent doing IO compared with the time spent counting words. Counting words might be so much faster than IO that parallelism doesn't speed up processing, and could even slow things down due thread management overhead.

Alternatively, if the file was already read and split into lines, which would mean that IO is not being included in performance, then performance gains might be obtained.

Also, how many threads and whether the count threads took one or several lines would likely matter.

Thomas Bitonti
  • 1,179
  • 7
  • 14
0

How I'm supposed to write run() to only process a part of the file when I can't pass it parameters?

You can pass parameters, but you pass them to the Test constructor, which then saves them in fields, for the run() method to use.

public class Test implements Runnable {
    private final int partToProcess;

    public Test(int partToProcess) {
        this.partToProcess = partToProcess;
    }

    @Override
    public void run() {
        // use this.partToProcess here
    }
}

You should not call getTotal() until the thread is done processing the file.

To wait for the thread to end, call join().

You also shouldn't use static for the total.

// Create threads
Test test0 = new Test(0);
Test test1 = new Test(1);
Thread thread0 = new Thread(test0);
Thread thread1 = new Thread(test1);

// Start threads
thread0.start();
thread1.start();

// Wait for threads to end
thread0.join();
thread1.join();

// Now we can print result here
int total = test0.getTotal() + test1.getTotal();
System.out.println("My program counts " + total + " occurrences of 'the'.");

If you want to split the file in more than two pieces, you should use arrays to store the Test and Thread instances.


To read second half of a file, you can't use a FileReader.

See e.g. question "How to read a file from a certain offset in Java?" to learn more.

Note that reading from 2 different positions in a file at the same time will slow processing down, unless you're using an SSD, because a normal hard disk arm cannot be in two places at the same time. As an exercise in multi-threading, this is fine, but in reality, you probably wouldn't want to do this.

Also note that when you split a file in two by file size, you'll likely be splitting the text of the file in the middle of a word, and if the text file uses a multi-byte encoding like UTF-8, you might even be splitting the bytes of a character, so you need to add code to detect this and work around it.

Andreas
  • 154,647
  • 11
  • 152
  • 247
-1

Basically you did it well with Threads but on output you have to take values from the instance.

Test t1  = new Test();
Thread th = new Thread(t1);
th.start();
//wait till Thread_th finish run method
while(th.getState() != Thread.State.TERMINATED)
{
   //Thread-states
   //New, Runnable, Blocked, Waiting, Timed Waiting, Terminated
}
System.out.println(t1.getTotal());

More over private static int total is not static !
Simpler example

public class MyT implements Runnable {

double d;
public static void main(String[] args)
{
    MyT myt = new MyT();
    Thread t1 = new Thread(myt);
    t1.start();
    while(t1.getState() != Thread.State.TERMINATED)
    {
        System.out.println(t1.getState());
    }
    System.out.println(t1.getState()+"_"+myt.getD());
}

@Override
public void run() {

    for(int i=0;i<3;i++)
    {
        d = Math.random();
        System.out.println(d);
    }
}
public double getD()
{
    return d;
}

Output

NEW
...
RUNNABLE
...
BLOCKED
0.7175015787267744
0.6915288485156048
0.777565206934673
RUNNABLE
...
TERMINATED_0.777565206934673
Traian GEICU
  • 1,750
  • 3
  • 14
  • 26
  • I'm not quite sure what the comments in the while loop mean – sam Apr 17 '20 at 18:51
  • you have to look on Thread doc. A thread have some states (working as finite automate). When `start Thread` you have to wait till `run method` is finished. Then you can get the calculation. Look on OUTPUT. I get D only after run is finished, that means T.state is TERMINATED. – Traian GEICU Apr 17 '20 at 18:54
  • `Thead` is not getting the result immediately. Need time for `run to finish`. So check on `while` when `Thread` is finished (`state is TERMINATED`) – Traian GEICU Apr 17 '20 at 18:58
  • `t1.join` is doing the same as `while(t1.getState() != Thread.State.TERMINATED)` : Wait_for_run_to_finish. `join` is preferred but either can be used with the same result. – Traian GEICU Apr 17 '20 at 19:14