5

For performance reason I would like to use a forEach loop of a parallel Lambda stream in order to process an instance of a Collection in Java. As this runs in a background Service I would like to use the updateProgress(double,double) method in order to inform the user about the current progress.

In order to indicate the current progress I need a certain progress indicator in form of a Integer counter. However, this is not possible as I can only access final variables within the Lambda expression.

Code example see below, Collection is only a place holder for any possible instance of a Collection:

int progress = 0;
Collection.parallelStream().forEach(signer -> {
   progress++;
   updateProgress(progress, Collection.size());     
});

I'm aware that I can solve this problem by using a simple for-loop. However, for performance reason it would nice to solve it in this way.

Does anybody know a more or less neat solution to this?

Michael Stauffer
  • 298
  • 1
  • 4
  • 15
  • 1. Why do you think this is actually more performant? 2. I think if this code is actually parallelized, incrementing an integer won't be thread safe; consider AtomicInteger instead. – markspace Nov 07 '14 at 16:55
  • There are several [benchmarks](http://stackoverflow.com/questions/21968918/java-8-nested-loops-with-streams-performance) which showed the performance improvement due to parallelStream. Which is not super surprising when computational intensive tasks are done in parallel. However, you are right that thread-safeness can be an issue. – Michael Stauffer Nov 07 '14 at 17:16

2 Answers2

8

As proposed by markspace, using an AtomicInteger is a good solution:

AtomicInteger progress = new AtomicInteger();
Collection.parallelStream().forEach(signer -> {
    progress.incrementAndGet();
    // do some other useful work
});

I would not use the runLater() variant as your goal is a high performance, and if many parallel threads will generte JavaFX 'runLater' tasks, you will again create a bottleneck...

For the same reason I would NOT call an update to the ProgressBar each time, but use a seaparte JavaFX Timeline to update the progress bar in regular intervals independently from the processing threads.

Here is a full code comparing sequential and parallel processing with ProgressBar. If you remove the sleep(1) and set the number of items to 10 million it will still work concurrently and efficiently...

public class ParallelProgress extends Application {
static class ParallelProgressBar extends ProgressBar {
    AtomicInteger myDoneCount = new AtomicInteger();
    int           myTotalCount;
    Timeline      myWhatcher = new Timeline(new KeyFrame(Duration.millis(10), e -> update()));

    public void update() {
        setProgress(1.0*myDoneCount.get()/myTotalCount);
        if (myDoneCount.get() >= myTotalCount) {
            myWhatcher.stop();
            myTotalCount = 0;
        }
    }

    public boolean isRunning() { return myTotalCount > 0; }

    public void start(int totalCount) {
        myDoneCount.set(0);
        myTotalCount = totalCount;
        setProgress(0.0);
        myWhatcher.setCycleCount(Timeline.INDEFINITE);
        myWhatcher.play();
    }

    public void add(int n) {
        myDoneCount.addAndGet(n);
    }
}

HBox testParallel(HBox box) {
    ArrayList<String> myTexts = new ArrayList<String>();

    for (int i = 1; i < 10000; i++) {
        myTexts.add("At "+System.nanoTime()+" ns");
    }

    Button runp = new Button("parallel");
    Button runs = new Button("sequential");
    ParallelProgressBar progress = new ParallelProgressBar();

    Label result = new Label("-");

    runp.setOnAction(e -> {
        if (progress.isRunning()) return;
        result.setText("...");
        progress.start(myTexts.size());

        new Thread() {
            public void run() {
                long ms = System.currentTimeMillis();
                myTexts.parallelStream().forEach(text -> {
                    progress.add(1);
                    try { Thread.sleep(1);} catch (Exception e1) { }
                });
                Platform.runLater(() -> result.setText(""+(System.currentTimeMillis()-ms)+" ms"));
            }
        }.start();
    });

    runs.setOnAction(e -> {
        if (progress.isRunning()) return;
        result.setText("...");
        progress.start(myTexts.size());
        new Thread() {
            public void run() {
                final long ms = System.currentTimeMillis();
                myTexts.forEach(text -> {
                    progress.add(1);
                    try { Thread.sleep(1);} catch (Exception e1) { }
                });
                Platform.runLater(() -> result.setText(""+(System.currentTimeMillis()-ms)+" ms"));
            }
        }.start();
    });

    box.getChildren().addAll(runp, runs, progress, result);
    return box;
}


@Override
public void start(Stage primaryStage) throws Exception {        
    primaryStage.setTitle("ProgressBar's");

    HBox box = new HBox();
    Scene scene = new Scene(box,400,80,Color.WHITE);
    primaryStage.setScene(scene);

    testParallel(box);

    primaryStage.show();   
}

public static void main(String[] args) { launch(args); }
}
simon04
  • 3,054
  • 29
  • 25
Jens-Peter Haack
  • 1,887
  • 13
  • 18
  • Nice solution. It should also be noted that this is more efficient only when the frequency of progress updates is faster than 10 milliseconds. – Tomas Mikula Nov 08 '14 at 15:20
  • jupp, that's correct... if the tasks computed in parallel are 'heavy' this is somehow 'overdriven'. But if the tasks are highly parallel (MANY cores) and small (tousands to millions per second) this gets useful... – Jens-Peter Haack Nov 08 '14 at 15:43
1

The naive solution would be to have progress as a field of some surrounding object; then referring to progress from a lambda closure would actually mean this.progress, where this is final, thus the compiler would not complain. However, the resulting code would access the progress field from multiple threads concurrently, which could cause race conditions. I suggest restricting access to the progress field to the JavaFX application thread, by using Platform.runLater. The whole solution then looks like this:

// accessed only on JavaFX application thread
private int progress = 0;

// invoked only on the JavaFX application thread
private void increaseProgress() {
    progress++;
    updateProgress(progress, collection.size());
}

private void processCollection() {
    collection.parallelStream().forEach(signer -> {
        // do the work (on any thread)
        // ...

        // when done, update progress on the JavaFX thread
        Platfrom.runLater(this::increaseProgress);
    });
}
Tomas Mikula
  • 6,537
  • 25
  • 39