I have to read a huge file contains text, around 3GB (and 40 Million lines). Just reading it happens really fast:
try (BufferedReader br = new BufferedReader(new FileReader("file.txt"))) {
while ((line = br.readLine()) != null) {
//Nothing here
}
}
With each read line
from above code i do some parsing on the string and process it further.(a huge task). I try to do that multiple threads.
A) I have tried BlockingQueue
like this
try (BufferedReader br = new BufferedReader(new FileReader("file.txt"))) {
String line;
BlockingQueue<String> queue = new ArrayBlockingQueue<>(100);
int numThreads = 5;
Consumer[] consumer = new Consumer[numThreads];
for (int i = 0; i < consumer.length; i++) {
consumer[i] = new Consumer(queue);
consumer[i].start();
}
while ((line = br.readLine()) != null) {
queue.put(line);
}
queue.put("exit");
} catch (FileNotFoundException ex) {
Logger.getLogger(ReadFileTest.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException | InterruptedException ex) {
Logger.getLogger(ReadFileTest.class.getName()).log(Level.SEVERE, null, ex);
}
class Consumer extends Thread {
private final BlockingQueue<String> queue;
Consumer(BlockingQueue q) {
queue = q;
}
public void run() {
while (true) {
try {
String result = queue.take();
if (result.equals("exit")) {
queue.put("exit");
break;
}
System.out.println(result);
} catch (InterruptedException ex) {
Logger.getLogger(ReadFileTest.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
}
This approach took more time than normal single threaded processing. I am not sure why - what am I doing wrong?
B) I have tried ExecutorService
:
try (BufferedReader br = new BufferedReader(new FileReader("file.txt"))) {
String line;
ExecutorService pool = Executors.newFixedThreadPool(10);
while ((line = br.readLine()) != null) {
pool.execute(getRunnable(line));
}
pool.shutdown();
} catch (FileNotFoundException ex) {
Logger.getLogger(ReadFileTest.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(ReadFileTest.class.getName()).log(Level.SEVERE, null, ex);
}
private static Runnable getRunnable(String run){
Runnable task = () -> {
System.out.println(run);
};
return task;
}
This approach also takes more time than printing directly inside while loop. What am I doing wrong?
What is the correct way to do it?
How can I efficiently process the read line
with multiple threads?