0

I have this task of calculating the frequency of characters in a text file using multithreading in java. The purpose is to evaluate whether the task can be accomplished faster sequentially or with a parallel approach.

The below code, when running with a single thread produces the correct answer, so I am confident the logic is sound. When I use two threads, I get this error I do not know what to do. I searched for this problem and concluded an independent thread would be worthwhile. Please help and let me know any other details I forgot to mention. Many Thanks!

{jdk 10 is being used.}

CODE:

Class SimpleArray

import java.util.Arrays;
public class SimpleArray {
private final int[] array;


public SimpleArray(int size)
{
    array = new int[size];
}


public synchronized void add (String word)
{
    for(int i = 0; i < word.length(); i++)
    {
        char c = word.charAt(i);
        int place = getInt(c);
        if(place < 0 || place > 25)
        {
            //Non characters are ignored.
        }
        else
        {
            try
            {
                array[place]++;
            }
            catch(IndexOutOfBoundsException ex)
            {
                System.err.println("Despite control, out of bound in array.");
            }

        }
    }
}


private int getInt(char c)
{
    int ascii = (int) c;

    if(ascii >= 65 && ascii <= 90)
    {
        return (ascii % 65);
    }
    else if (ascii >= 97 && ascii <= 122)
    {
        return (ascii % 97);
    }
    else
    {
        return -1;
    }

}

//used for outputting the content of shared integer array
public synchronized String toString()
{
    return Arrays.toString(array);
}


}

Class ArrayWriter

import java.lang.Runnable;
import java.util.NoSuchElementException;
import java.util.Scanner;


public class ArrayWriter implements Runnable{

private final SimpleArray sharedSimpleArray;
private final Scanner input;

public ArrayWriter(SimpleArray array, Scanner input)
{
    this.input = input;
    sharedSimpleArray = array;
}


@Override
public void run() 
{
    //add input from file here
    try 
    {
        while(input.hasNext())
        {
            String word = next();
            //System.out.println(word);
            sharedSimpleArray.add(word);
        }
    }
    catch(NoSuchElementException elementException)
    {
        System.err.println("File improperly formed. Terminating...");
        System.exit(2);
    }
    catch(IllegalStateException stateException)
    {
        System.err.println("Error reading from file. Terminating...");
        System.exit(2);
    }

}

private synchronized String next()
{
    return input.next();

}

}

Class Parallel

import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.io.IOException;
import java.nio.file.Paths;
import java.util.Scanner;
import java.util.concurrent.ExecutorService;


public class Parallel 
{
     private Scanner input;
     private ArrayWriter writer1;
     private ArrayWriter writer2;
     SimpleArray sharedSimpleArray;

public Parallel()
{
    // Open file for reading
    openFile();

    // Construct the shared object
    sharedSimpleArray = new SimpleArray(26);

    // Create two tasks to write to the shared simpleArray
    writer1 = new ArrayWriter(sharedSimpleArray, input);
    writer2 = new ArrayWriter(sharedSimpleArray, input);


    // Execute the task with an executor service
    ExecutorService executorService = Executors.newCachedThreadPool();
    executorService.execute(writer1);
    executorService.execute(writer2);

    executorService.shutdown();

    try
    {
        // Wait one minute for both writers to finish waiting
        boolean taskEnded = executorService.awaitTermination(1, TimeUnit.MINUTES);

        if(taskEnded)
        {
            System.out.printf("%nContents of SimpleArray %n");
            System.out.println(sharedSimpleArray);
            closeFile();
        }
        else
        {
            System.out.println("Timed out waiting for the threads to finish.");
            closeFile();
        }
    }
    catch(InterruptedException ex)
    {
        ex.printStackTrace();
    }


}



// open file Dictionary.txt
private void openFile()
{
    try
    {
        input = new Scanner(Paths.get("Dictionary.txt"));
    }
    catch(IOException ioException)
    {
        System.err.println("Error Opening file. Terminating...");
        System.exit(1);
    }
}


// close file and terminate application
private void closeFile()
{
    if(input != null)
    {
        input.close();
    }
}


public static void main(String[] args)
{
    Parallel any = new Parallel();
}
}

All the below is the output of the program.

Exception in thread "pool-1-thread-1" java.nio.BufferOverflowException
    at java.base/java.nio.HeapCharBuffer.put(HeapCharBuffer.java:229)
    at java.base/java.io.Reader.read(Reader.java:106)
    at java.base/java.util.Scanner.readInput(Scanner.java:882)
    at java.base/java.util.Scanner.hasNext(Scanner.java:1446)
    at ArrayWriter.run(ArrayWriter.java:24)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:844)
Contents of SimpleArray 
[295536, 63853, 152946, 113156, 376411, 39234, 82616, 92359, 312954, 5455, 26809, 194887, 105192, 251370, 251561, 113653, 5883, 246109, 250242, 230862, 131483, 33070, 22405, 10492, 70574, 14758]

The answer is a bit off.

1 Answers1

0

I think this is typical race condition that can occur in a program. you are trying to read two threads with the same scanner, and race condition occurs in this logic - Please read my comments in the code.

while(input.hasNext()) //both threads eveluate this condition as true for last character
    {
        String word = next(); //One thread has already advanced. So you are out of buffer
        sharedSimpleArray.add(word);
    }

Of course, I havent tried running this program, so cannot tell you confidently.

However, I would rather suggest you more thing.. No matter what you try to do, I think File IO operations are difficult to parallelize. Disk has limited capacity, and even if you put 10 threads to do the job, You will not be able to go beyond max read speed of disk.

The best option to use multithreading is to use producer/consumer. where producer reads the disk, and feeds it to multiple consumers with the logic to calculate count.

class Producer implements Runnable{

public read() {
    while(input.hasNext())
        {
            String word = next();
            //add to a Q;
        }
    }
}

//Multiple threads of consumer.
Class Consumer implements Runnable{
    void consume() {
        //process Q.peek()
    }
}

Or At least add synchronized around reading logic.

while(input.hasNext())
    {
        String word = null;
        synchronized(input) {
            if(input.hasNext()) {
                word = next(); 
            }
        }
        if (word != null) {
            sharedSimpleArray.add(word);
        }
    }

Its bit ugly, but cant help.

Anand Vaidya
  • 1,374
  • 11
  • 26
  • Thanks. Now I get a different error. But at least the buffer overflow one has disappeared. Sometimes it outputs properly. Sometimes it doesn't. Error below: – Muhammad Mubashirullah Durrani Oct 10 '18 at 15:48
  • Exception in thread "pool-1-thread-1" java.lang.IndexOutOfBoundsException: end at java.base/java.util.regex.Matcher.region(Matcher.java:1514) at java.base/java.util.Scanner.hasTokenInBuffer(Scanner.java:948) at java.base/java.util.Scanner.hasNext(Scanner.java:1443) at ArrayWriter.run(ArrayWriter.java:33) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:844) – Muhammad Mubashirullah Durrani Oct 10 '18 at 15:50
  • It's a different issue and this might help you - https://stackoverflow.com/a/27619945/4675277 – Anand Vaidya Oct 10 '18 at 17:28