Efficient way of synchronize ArrayList in java when you have to process it parellel

Question

I have collection of list and I have to iterate on each list element and put it into a another list.The data is very huge so I need to process it parallel so that I can get good processing time.Also I need to preserve the order of lists.I have lost element from list when I am using it as mentioned or sometime getting NULL.What will we efficient way of making list synchronize or thread safe.

 java.util.List<T> metadata = new ArrayList<T>();
sourceValuesIterable.parallelStream().forEach(tblRow ->
{
    metadata.add();
});

One more question: When you remove the NULL from collection using Guava's Predicates does it change the order of list element?

Thanks in advance.

Why not use `map` and `collect` - `sourceValuesIterable.parallelStream().map(...).collect(Collectors.toList());`? — KunLun, Jul 22 '20 at 12:52
If the data really is huge, I would do everything possible to avoid making a copy. What do you need the copy for, maybe there's a way to remove the need for it? — Joni, Jul 22 '20 at 13:11
@joni are you talking about why I am adding sVI data to metadata.Well this contains raw data and I need the specific data from it. — user_a27, Jul 22 '20 at 14:34
So you start with a large list of composite objects, and you want to extract a specific component object from each composite, and create a new list from them? — Joni, Jul 22 '20 at 15:28
@Joni I did not know much about composite object. Also with arraylist the only concern is about thread safety and for that we have a lot of things there to achieve this. — user_a27, Jul 22 '20 at 16:17

score 1 · Answer 1 · answered Jul 22 '20 at 12:52

1

Parallelism requires a single 'stream pipeline' if you want to stand any chance of order being preserved. Fortunately, you can do that here: map your sVI to Ts, then turn the stream into a list by collecting it:

List<T> metadata = sVI.parallelStream()
    .map(tblRow -> new ThingieThatGoesInMetadata())
    .collect(Collectors.toList());

Start there; this way, the ordering is guaranteed.

answered Jul 22 '20 at 12:52

rzwitserloot

85,357
5
51
72

`Collectors.toList` doesn't pre-size the arraylist so for a huge list, this is going to go through many iterations of array copies in order to expand to the necessary capacity – Michael Jul 22 '20 at 12:56
@rzwitserloot's what is .map do there and does it has any limitations ? – user_a27 Jul 22 '20 at 15:03

score 0 · Answer 2 · answered Jul 22 '20 at 13:03

I think it's a mistake to assume that parallelising this task and adding elements one at a time to the new list is automatically going to be the fastest way to copy it.

For starters, you didn't pre-size the new ArrayList, so it's going to continually be resizing as you add elements in order to reach the necessary capacity.

There is also an overhead associated with spinning up a parallel stream and with merging the results.

ArrayList already has a copy constructor which will do an efficient copy. Ultimately, that's just going to be copying the underlying array of references. It's hard to imagine being able to beat that kind of low-level operation for performance.

As always with performance-related concerns, your best bet is to profile it, measure the results, and use data to inform your decisions.

Efficient way of synchronize ArrayList in java when you have to process it parellel

2 Answers2