4

in the last days I'm starting to "play" with some Java 8 features, like stream (I studied a bit of documentation and several examples).

In my application I have a Map and I need to get the three element with highest value (the float part).

I tried different modifications to my code (and some of these solutions also: Sort a Map<Key, Value> by values (Java) ), for example:

Map<Long, Float> great = createMapWith20Elements();
Map<Long, Float> small = great.entrySet().stream()
        .sorted(Map.Entry.<Long, Float>comparingByValue().reversed()) 
        .limit(3) 
        .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

But the reslt is always the same: sometimes the code works fine, other it gives me a

java.lang.ArrayIndexOutOfBoundsException: 19

In rare cases, the index out of the bounds is 18.

This "random" behaviour (18, 19, or correct elaborations) makes me think to a "parallel threading" problem.

I'm sure that great map has always 20 elements... if I print them I receive:

2,-0.5
3,0.0
4,0.0
5,0.0
6,0.0
7,-0.33333334
8,0.0
9,0.0
10,0.0
11,0.0
12,0.5
13,0.0
14,0.0
15,-0.5
18,0.0
19,0.0
21,0.0
22,0.0
23,0.0
24,0.0

I'm conscious that 17 objects are candidate to be the first 3... but it is not a problem for my algorithm.

Can you help me in some way?

Thanks

EDIT:

The method createMapWith20Elements() has a dummy name for better explaining my situation: I'm sure it returns 20 elements because it makes a DB reading... but it should return any matching record.

By the way it ends with

// myIds is an ArrayList<Long>
myIds.parallelStream().forEach(e -> trust.put(e, 0f));
return trust;

Replacing with myIds.stream() it seems working fine... I'm not able to figure how using parallelStream to write to an object (Collection and not Stream), and returning the object itself (Collection), in the calling function it can lead to this kind of problem.

Community
  • 1
  • 1
Pierpaolo Cira
  • 1,457
  • 17
  • 23
  • 3
    since you have multiple method calls chained, why not unchain them and see the results of each call before the next one? A debugger should tell you exactly where your map has less than 20, 19, 18, 17 elements... – blurfus Jun 23 '15 at 22:10
  • I will try... by the way I think the problem is due to .sorted(...) method call, because removing it and refreshig an huge number of time the page, the error doesn't appare anymore... – Pierpaolo Cira Jun 23 '15 at 22:19
  • 3
    Would you please post a reproducible example, it would be much easier for us to help you. Filling a map with the data you gave using your code work fine for me. – Jean-François Savard Jun 23 '15 at 23:24
  • 4
    Can you post the stack trace please? – Stuart Marks Jun 24 '15 at 01:53
  • 2
    Aside: a slightly easier way to sort the entries is `sorted(comparing(Map.Entry::getValue, reverseOrder()))` – Stuart Marks Jun 24 '15 at 02:05
  • +1 for every comment; thank you. I followed any suggestion... and if I made and run a method where there is an hardcoded building the map with the same values I reported everything works fine... I thought the problem is the method that create the map (it access to a datastore and gets and computes results)... in effect it ends with a parallel stream operation (I put it in the main question)... replacing it with a sequencial stream it seems working fine... – Pierpaolo Cira Jun 24 '15 at 07:47
  • 2
    If you say `parallelStream().forEach(…)` you *really* perform the action multi-threaded. So if you call `trust.put` in that action and `trust` is not a thread safe `Map` it’s possible that you corrupt its data structure. – Holger Jun 24 '15 at 09:20
  • @Holger Yes, parallelStream is multithreaded... so do you think that the problem is not that some threads not be properly joined... but that the "put" action is not thread-safe, and two different elements (having the same value of hash(key)) try to be stored in the same moment (and the first one is overwritten instead of linking the second)? ... it seems the right solution... thank you – Pierpaolo Cira Jun 24 '15 at 13:35
  • PS: I'm reviewing any part of my code to be sure to call .parallelStrem only on thread-safe objects (and only when the cost a multithreading operation is really justified)... in the other case .stream will be ok. – Pierpaolo Cira Jun 24 '15 at 13:37
  • 1
    Instead of `forEach`, you should use a Collector to create the map. This is safe even if the stream is parallel and the result map isn't thread-safe. – Stuart Marks Jun 24 '15 at 16:38
  • @StuartMarks thank you... I will follow your suggestion next time... – Pierpaolo Cira Jun 24 '15 at 19:50

2 Answers2

7

I think that the problem is the method createMapWith20Elements().

You are inserting elements in the map (probably a HashMap or a TreeMap) concurrently and both HashMap and TreeMap are not synchronized. So concurrent insertions (put method invocations) break the map structure (you get a corrupted map).

As you mention:

// myIds is an ArrayList<Long>
myIds.parallelStream().forEach(e -> trust.put(e, 0f));
return trust;

sometimes produces the errors. But

// myIds is an ArrayList<Long>
myIds.stream().forEach(e -> trust.put(e, 0f));
return trust;

does not produce the error.

If you want to insert concurrently then you must use a synchronized wrapper. So your code should be:

// myIds is an ArrayList<Long>
Map<Long, Float> syncTrust = Collections.synchronizedSortedMap(trust);
myIds.parallelStream().forEach(e -> syncTrust.put(e, 0f));
return trust;
acesargl
  • 569
  • 3
  • 7
2

The issue is with the progressive resizing of the underlying not synchronized collection, I guess in your case your Map structure is not synchronized. Due to the gradual re-size, not handled correctly in a multi threads context, it can bring to have a thread trying to insert an element out of the range of the current size of the vector.

From the book "Oracle Certified Professional Java SE 8 Programmer II", Sybex:

For an ArrayList object, the JVM internally manages a primitive array of the same type. As the size of the dynamic ArrayList grows, a new, larger primitive array is periodically required. If two threads both trigger the array to be resized at the same time, a result can be lost, producing the unexpected value shown here. As briefly mentioned earlier, and also discussed later in this chapter, the unexpected result of two tasks executing at the same time is a race condition.

Enrico Giurin
  • 2,183
  • 32
  • 30