1

I'm writing simple html parser with JSoup. I've got about 50 000 links to check, so I thought it's great chance to learn abut threads and concurnecy. I've got 8 tasks registered with ExecutorService: 6 of them parse links to some data stored in ArrayLists and then add it to the BlockingQueues. Two of the tasks are filewriters based on BufferedWriter. The problem is when my 6 tasks finish prase all links, file writers stop write data from BlockingQueue, so I lose part of data. I'm pretty newbie in java, so if you could give me a hand.... The code:

Main file:

public static void main(String[] args) {
    BlockingQueue<ArrayList<String>> units = new ArrayBlockingQueue<ArrayList<String>>(50, true);
    BlockingQueue<ArrayList<String>> subjects = new ArrayBlockingQueue<ArrayList<String>>(50, true);
    File subjectFile = new File("lekarze.csv");
    File unitFile = new File("miejsca.csv");
    ExecutorService executor = Executors.newFixedThreadPool(9);
    executor.submit(new Thread(new FileSaver(subjects, subjectFile)));
    executor.submit(new Thread(new FileSaver(units, unitFile)));
    for(int i = 29323; i < 29400; i++){
        executor.submit(new ParserDocsThread(i, subjects, units, errors));
    }

    executor.shutdown();
}

FileSaver class:

package parser;

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.concurrent.BlockingQueue;

public class FileSaver implements Runnable {
    private BlockingQueue<ArrayList<String>> toWrite = null;
    private File outputFile = null;
    private BufferedWriter writer = null;

    public FileSaver(BlockingQueue<ArrayList<String>> queue, File file){
       toWrite = queue;
       outputFile = file;
   }

    public void run() {
       try {
          writer = new BufferedWriter(new FileWriter(outputFile, true));
          while(true){
              try{
                  save(toWrite.take());
              } catch(InterruptedException e) {
                  e.printStackTrace();
              }
          }
      } catch (IOException e) {
          e.printStackTrace();
      }
  }

  private void save(ArrayList<String> data){
      String temp ="";
      int size = data.size();
      for(int i = 0; i < size; i++){
          temp += data.get(i);
          if(i != size - 1) temp += '\t'; 
      }
      try {
          writer.write(temp);
          writer.newLine();
      } catch (IOException e) {
          e.printStackTrace();
      }
  }
}

In ParserDocsThread I'm only use put() method to add elements to BlockingQueue.

Kamil Cierczek
  • 85
  • 1
  • 3
  • 9
  • 1
    `executor.submit(new Thread(new FileSaver(…)));` — *don’t submit threads to an executor*. The fact that `Thread` implements `Runnable` is misleading, you could say a historical mistake. Just submit your job: `executor.submit(new FileSaver(…));`… – Holger Oct 27 '15 at 23:20

1 Answers1

0

Your consumer threads don't end cleanly because the take() call is waiting for a new array, and are not closing the buffered writer. The ServiceExecutor gives up on waiting for these threads to finish, and kills them. This is causing the last lines in the writer to not be written out to disk.

You should use poll(10, TimeUnit.SECONDS) (but with an appropriate timeout). After that timeout, your consumers will give up on the producers, and you should make sure you close your buffered writer properly so that the last of the buffer is printed out properly.

try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile, true)))
{
    while(true){
        List<String> data = toWrite.poll(10, TimeUnit.SECONDS);
        if (data == null) {
           break;
        }
        save(data, writer);
    }
} catch (...) {
}

I've put the buffered writer here into a try-with-resources (so the try here will automatically close the writer) and passed it to your save method, but you can do it your way, and manually close the writer in a finally block if you want:

try {
    ...
} catch(...) {
} finally {
    writer.close(); // Closes and flushes out the remaining lines
}

You may also want to put in a call to awaitTermination on the executor servier (like so: How to wait for all threads to finish, using ExecutorService?) with a wait time greater than your poll timeout.

Community
  • 1
  • 1
AndyN
  • 2,075
  • 16
  • 25