9

I have the following code:

mostRecentMessageSentDate = messageInfoList
    .stream()
    .findFirst().orElse(new MessageInfo())
    .getSentDate();

unprocessedMessagesCount = messageInfoList
    .stream()
    .filter(messageInfo -> messageInfo.getProcessedDate() == null)
    .count();

hasAttachment = messageInfoList
    .stream()
    .anyMatch(messageInfo -> messageInfo.getAttachmentCount() > 0);

As you can see, I stream the same list 3 times, because I want to find 3 different values. If I did this in a For-Each loop, I could loop just once.

Is it better, performance wise to do this a for loop then, so that I loop only once? I find the streams much more readable.

Edit: I ran some tests:

import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

public class Main {

public static void main(String[] args) {

    List<Integer> integerList = populateList();

    System.out.println("Stream time: " + timeStream(integerList));
    System.out.println("Loop time: " + timeLoop(integerList));

}

private static List<Integer> populateList() {
    return IntStream.range(0, 10000000)
            .boxed()
            .collect(Collectors.toList());
}

private static long timeStream(List<Integer> integerList) {
    long start = System.currentTimeMillis();

    Integer first = integerList
            .stream()
            .findFirst().orElse(0);

    long containsNumbersGreaterThan10000 = integerList
            .stream()
            .filter(i -> i > 10000)
            .count();

    boolean has10000 = integerList
            .stream()
            .anyMatch(i -> i == 10000);

    long end = System.currentTimeMillis();

    System.out.println("first: " + first);
    System.out.println("containsNumbersGreaterThan10000: " + containsNumbersGreaterThan10000);
    System.out.println("has10000: " + has10000);

    return end - start;
}

private static long timeLoop(List<Integer> integerList) {
    long start = System.currentTimeMillis();

    Integer first = 0;
    boolean has10000 = false;
    int count = 0;
    long containsNumbersGreaterThan10000 = 0L;
    for (Integer i : integerList) {
        if (count == 0) {
            first = i;
        }

        if (i > 10000) {
            containsNumbersGreaterThan10000++;
        }

        if (!has10000 && i == 10000) {
            has10000 = true;
        }

        count++;
    }

    long end = System.currentTimeMillis();

    System.out.println("first: " + first);
    System.out.println("containsNumbersGreaterThan10000: " + containsNumbersGreaterThan10000);
    System.out.println("has10000: " + has10000);

    return end - start;
}
}

and as expected, the for loop is always faster than the streams

first: 0
containsNumbersGreaterThan10000: 9989999
has10000: true
Stream time: 57
first: 0
containsNumbersGreaterThan10000: 9989999
has10000: true
Loop time: 38

But never significantly.

The findFirst was probably a bad example, because it just quits if the stream is empty, but I wanted to know if it made a difference.

I was hoping to get a solution that allowed multiple calculations from one stream. IntSummaryStatistics dont do exactly what I want. I think I'll heed @florian-schaetz and stick to favouring readbility for a marginal performance increase

Somaiah Kumbera
  • 7,063
  • 4
  • 43
  • 44
  • 1
    "Premature optimization is the root of all evil." Don't try to optimize your code unless you know that this code needs optimization. In this case, it depends on the size of your list, the distribution of attachments, how often you call it, etc. - but I would guess, for 99% of all people, a code like this would not need optimization, since the speed gain would be minimal at best. And IF you need to optimize it, don't force yourself to use streams. If you don't need them and they make the code harder to read... don't use them. – Florian Schaetz May 29 '17 at 09:05
  • 1
    You may want to look at `IntSummaryStatistics` and `Collectors.summarizingInt` which do something similar, collecting three different pieces of information in one pass over a stream. But unless your list is very large it probably isn't worth it. – David Conrad May 29 '17 at 09:27
  • 1
    See [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/q/504103/2711488) – Holger May 29 '17 at 12:55

1 Answers1

5

You don't iterate through the collection three times.

mostRecentMessageSentDate = messageInfoList
        .stream()
        .findFirst().orElse(new MessageInfo())
        .getSentDate();

The above checks if there are any elements in the collection and returns a value depending on this. It doesn't need to go through the whole collection.

unprocessedMessagesCount = messageInfoList
        .stream()
        .filter(messageInfo -> messageInfo.getProcessedDate() == null)
        .count();

This one needs to filter out all elements without a process date and counts them, so this one goes through the whole collection.

hasAttachment = messageInfoList
        .stream()
        .anyMatch(messageInfo -> messageInfo.getAttachmentCount() > 0);

The above only needs to go through the elements until it finds a message with an attachment.

So, of the three streams, only one of them is required to go through the whole collection, in the worst case you do the iteration two times (the second, and potentionally the third stream).

This could probably be done more efficient with a regular For-Each loop, but do you really need it? If your collection only contains a few objects, I wouldn't bother optimizing it.

However, with a traditional For-Each loop, you could combine the last two streams:

int unprocessedMessagesCount = 0;
boolean hasAttachment = false;

for (MessageInfo messageInfo: messageInfoList) {
  if (messageInfo.getProcessedDate() == null) {
    unprocessedMessagesCount++;
  }
  if (hasAttachment == false && messageInfo.getAttachmentCount() > 0) {
    hasAttachment = true;
  }
}

It is really up to you if you think this is a better solution (I also find the streams more readable). I don't see a way to combine the three streams into one, at least not in a more readable way.

Magnilex
  • 11,584
  • 9
  • 62
  • 84
  • 1
    I starred the question because I was hoping for a stream related way, to do 2 read only operations at the same time, without using for each loops, or forEach syntax. – Ryan Leach May 29 '17 at 08:50
  • 1
    @RyanTheLeach Yeah, but as I say in the answer, I don't see a way to do it in a good, readable way. – Magnilex May 29 '17 at 08:52
  • 4
    There is probably a way writing it with streams by abusing `map` or similar, but I highly doubt that it would a good way. Check if your code is actually in NEED of runtime optimization, otherwise remember: Premature optimization is the root of all evil - in this case, you would sacrifice readability for solving a non-problem. Only if it actually IS a performance problem, refactor it, in which case @Magnilex solutions seems better readable than any stream version I can think of. – Florian Schaetz May 29 '17 at 09:02
  • Thanks for the answer @Magnilex I was hoping for an answer that allowed multiple calculations in the same stream (see my edit) – Somaiah Kumbera May 29 '17 at 09:39