-1

Let's suppose a very normal behavior in development: I have one Collection and need to map this Collection to another object. A flatMap scenario.

Example:

We have some method that must return a Set of Source objetc:

public Set<Source> getSources(String searchText);

Let's figure out one implementation:

public Set<Source> getSources(String searchText) {
  HashSet<Source> sources = new HashSet<>();

  Set<String> urls = this.crawlerService.getUrls(searchText);

  urls.forEach(url -> sources.add(Source.builder().url(url).build()));

  return sources;
}

One other implementation with Java Stream:

public Set<Source> getSources(String searchText) {

  Set<String> urls = this.crawlerService.getUrls(searchText);

  return urls.stream()
             .flatMap(e -> Stream.of(Source.builder().url(e).build()))
             .collect(Collectors.toSet());

}

I prefer the stream way, but I have some questions: how expensive in performance terms is convert to stream and collect to set? It's acceptable to use Stream this way or it's an overkill? Have some other best way to do this kind of scenario using java Stream?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Rodrigo Sene
  • 309
  • 1
  • 13
  • 4
    Why do you need to use `flatMap` and not `map`? – smac89 Sep 08 '19 at 14:15
  • To add to what @smac89 said, why wouldn't you use `.map (e -> Source.builder().url(e).build)`? – Jaywalker Sep 08 '19 at 14:16
  • Because the original set is String type, and I need to convert to a different type when I have this kind of scenario I need to use flatMap right ? – Rodrigo Sene Sep 08 '19 at 14:17
  • 1
    @RodrigoSene nope. That's the perfect scenario for `map` – smac89 Sep 08 '19 at 14:18
  • Ow Really I tested here and really worked, I don't know that I could use only map, but how costly is this with performance ? – Rodrigo Sene Sep 08 '19 at 14:21
  • 2
    You use `flatMap` when each element in the stream is mapped to a stream and you want all of the elements together. You use `map` if each element translate to exactly one element. – RealSkeptic Sep 08 '19 at 14:21
  • 2
    "*how expansive in performance terms is convert to stream and collect to set?*" - Do you have performance issues? If not, then do not optimize. ["*premature optimization is the root of all evil*" -- Donald Ervin Knuth: *Computer Programming as an Art* (1974), p. 671](https://dl.acm.org/ft_gateway.cfm?id=361612&ftid=289767) – Turing85 Sep 08 '19 at 14:23
  • So, in this case, without premature thinking I can use both way? But how the collect work's ? This code is iterating over set one time to map and other to collect ? – Rodrigo Sene Sep 08 '19 at 14:27
  • @RodrigoSene No, both will iterate the collection once, the stream approach would give you a little overhead (because of building and using its infrastructure) which is unnoticeable on small data sets and insignificant in other cases. – Andrew Tobilko Sep 08 '19 at 14:47
  • @AndrewTobilko It isn't that simple. *which is unnoticeable on small data sets* ... when you do a stream operation for 5 elements ... 10K times per minute, then it becomes noticeable. I agree that Streams provide a nice concise way to express many problems in a functional way. But still: there can be quite a bit of overhead. I dont mind using streams (ignoring "number of elements") when I know my code runs once per minute, or more even less frequently. But in a large system, where many programers add code ... things are slightly different. – GhostCat Sep 08 '19 at 15:07
  • One 5-element stream here, another there, and another one that runs a 100 times every minute ... maybe things are adding up then. And then, after a year "oh, gosh, overall performance sucks". Not because one big issue, but because of a thousand little "how could that one thing matter" added over time. – GhostCat Sep 08 '19 at 15:12

1 Answers1

0

how expensive in performance terms is convert to stream and collect to set? It's acceptable to use Stream this way or it's an overkill?

In the first place, stream is already more expensive than simply creating a new set and adding the elements to it using a loop, but you won't notice this cost unless you benchmark. So go ahead and benchmark both examples.

If you take a look at how java implements streams, you will realize it is just a flexible wrapper over already existing java.util.Iterator, so all you gain from using streams are their flexibility (and sometimes speed, but that shouldn't be the selling point)


As for your stream example, you are causing the stream to be expensive by creating that intermediate stream using flatMap. All flatMap will do is to undo what you did in there and return the contents of the stream, so you might as well have just used map in the first place.

public Set<Source> getSources(String searchText) {

  Set<String> urls = this.crawlerService.getUrls(searchText);

  return urls.stream()
             .map(e -> Source.builder().url(e).build())
             .collect(Collectors.toSet());
}

See Java 8: performance of Streams vs Collections

smac89
  • 39,374
  • 15
  • 132
  • 179