1

Regarding Codility's lesson OddOccurrencesInArray

I got total of 66% (Performance 0%) using the following code:

public int solution(int[] A) {
    return Arrays
            .stream(A)
            .parallel()
            .boxed()
            .collect(Collectors.groupingBy(n -> n, Collectors.counting()))
            .entrySet()
            .parallelStream()
            .filter(entry -> entry.getValue() % 2 != 0)
            .findAny()
            .map(Map.Entry::getKey)
            .orElseThrow();
}

However, I got total of 100% (Performance 100%) using code from here:

public int solution(int[] A) {
    HashMap<Integer, Integer> histogram = new HashMap<>();
    for (int i = 0; i < A.length; i++) {
        if (histogram.containsKey(A[i])) {
            int occurrences = histogram.get(A[i]);
            occurrences++;
            histogram.put(A[i], occurrences);
        } else {
            histogram.put(A[i], 1);
        }
    }
    Set<Integer> keySet = histogram.keySet();
    for (int currentKey : keySet) {
        int occurrences = histogram.get(currentKey);
        if (occurrences % 2 != 0) return currentKey;
    }
    throw new RuntimeException();
}

The following hybrid solution got total of 88% (1 Performace test failed):

public int solution(int[] A) {
    HashMap<Integer, Integer> histogram = new HashMap<>();
    for (int i = 0; i < A.length; i++) {
        if (histogram.containsKey(A[i])) {
            int occurrences = histogram.get(A[i]);
            occurrences++;
            histogram.put(A[i], occurrences);
        } else {
            histogram.put(A[i], 1);
        }
    }
    return histogram.entrySet()
            .parallelStream()
            .filter(n -> (n.getValue() % 2 != 0))
            .findAny()
            .map(Map.Entry::getKey)
            .orElseThrow();
}

I also tried to change all parallelStream() to stream(). Same results!

Are streams less efficient? or it's just Codility?

jps
  • 20,041
  • 15
  • 75
  • 79
Ahmed Ghoneim
  • 6,834
  • 9
  • 49
  • 79
  • 1
    Parallel is likely to be slower when the task is very simple, such as in this case. There is overhead involved. In particular with parallel findAny should be as fast or faster than findFirst as it can use any match (doesn't have to be sure it is the first). The examples do not do the same thing. Normal version iterates over the keys, the other over the entries. Apart from that I would have guessed the non-parellel streams version to be comparable with the the non-streams version. – ewramner Aug 02 '21 at 10:01
  • Streams allocate a whole lot of extra objects (no that's absolutely not free, despite old claims to it) and all the 2nd order functions make it waaay harder (in practice even impossible) for the compiler to do many optimizations. So yes, while many people are ignorant to it, if you need high performant code streams are a bad choice. In many situations the performance overhead is not that important given the much cleaner code though. That said using parallel here is probably the biggest performance killer. – Voo Aug 02 '21 at 10:12
  • You [shall benchmark](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) to prove them wrong and conclude your learnings. – Naman Aug 02 '21 at 16:17
  • @Naman Microbenchmarks are hard to write, but the listed code runs more than long enough and produces relevant output so that measuring inaccuracies and compiler optimizations are irrelevant. – Voo Aug 02 '21 at 19:32
  • @Voo not sure what you mean by "runs more than long enough" and "measuring inaccuracies and compiler optimizations are irrelevant.", benchmarking would be more about comparing the performances and understanding which one amongst the shared snippet performs better. – Naman Aug 03 '21 at 03:36
  • @Naman The point is that the programs will easily run for a second or more at which point we're out of microbenchmark territory and you can simply compare the runtime. (you can easily also write microbenchmarks that will show that streams have a noticeable performance overhead if you so wish) – Voo Aug 03 '21 at 06:02
  • That depends on what your test data constitutes of @Voo. Just to convey my opinion, I might as well believe you if you have the data specific to the shared use case. :) – Naman Aug 03 '21 at 18:10
  • Streams in general are not as efficient as non-stream based code. There is quite a bit of overhead to them. Imo, they have two main adantages. 1) based on the proficiency of the programmer, they permit verbose tasks to be constructed quickly and more closely follow the problem statement. 2) In a multicore environment with lots of data to process they can spare the programmer the headaches of managing threads. – WJS Aug 14 '21 at 16:36
  • Just to add that codility *probably* runs the tests in a single core container to reduce costs, so any parallel code will just add overhead and bring no benefits. I'm not sure if you can see the output of the call to `Runtime.getRuntime().availableProcessors()` (this is not 100% accurate, but it's the best). – Augusto Nov 07 '22 at 10:12

1 Answers1

1

Codility fixed it, and all solutions are getting total of 100% (Performance 100%)

Ahmed Ghoneim
  • 6,834
  • 9
  • 49
  • 79