Using temp variable using collectors

Question

I have the following code. I am trying to understand if it would make any changes to memory.

Approach 1: Using collectors I can directly return map like so:

    List<Customer> customerList = new ArrayList<>();
    customerList.add(new Customer("1", "pavan"));
    customerList.add(new Customer("2", "kumar"));

    return customerList.stream().collect(Collectors.toMap(t->t.getId(), t->t));

Approach 2: Using an explicit map to collect results, like so:

   Map<String,Customer> map = new HashMap<String, Customer>();  
   map = customerList.stream().collect(Collectors.toMap(t->t.getId(), t->t));       
   return map;

Compared to the first, does the second approach make any difference to memory/ GC, if I iterate over a million times?

In your second example, I think you might mean to use .forEach() and put() in the map? (If not, you're instantiating a map for nothing :) ) By the way, Guava offers Maps.uniqueIndex which is a slightly shorter way of writing this same code. I'm pretty sure the overhead for these is virtually identical. — EvanM, Aug 17 '17 at 05:48
Yeah i could do that.But i am trying to understand if the second code has anything to do with GC performance — nanpakal, Aug 17 '17 at 05:54
Yeah - they're virtually identical. The collector creates a HashMap too, but it's a little bit fancier about looking for duplicates. Your second snippet ignores duplicates. Should be almost the exact same GC overhead. — EvanM, Aug 17 '17 at 06:02
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/stream/Collectors.java has the relevant source. — EvanM, Aug 17 '17 at 06:18

score 1 · Answer 1 · answered Aug 17 '17 at 06:40

1

Aside from instantiating a Map instance that you don't need in the second example, both pieces of code are identical. You immediately replace the Map reference you created with the one returned by the stream. Most likely the compiler would eliminate that as redundant code.

The collect method of the streams API will instantiate a Map for you; the code has been well optimised, which is one of the advantages of using the Stream API over doing it yourself.

To answer your specific question, you can iterate over both sections of code as many times as you like and it won't make any difference to the GC impact.

answered Aug 17 '17 at 06:40

Speakjava

3,177
13
15

putting GC aside for a while; there are differences and *potentially* this code can even be improved as far as I can see... – Eugene Aug 17 '17 at 12:07
oh darn! Simon Ritter :) what a pleasant surprise; I've probably watched all your recent videos on youtube. Still, I find that this code can be improved, I've added an answer for that – Eugene Aug 17 '17 at 12:10
:-). I agree (as you have posted in your answer) that the code could be improved. – Speakjava Aug 17 '17 at 13:17

score 1 · Answer 2 · answered Aug 17 '17 at 12:02

There code is by far not identical; specifically Collectors.toMap says that it will return a Map :

There are no guarantees on the type, mutability, serializability, or thread-safety of the Map returned.

There are absolutely no guarantees what-so-ever that the returned Map is actually a HashMap. It could be anything other - any other Map here; so assigning it to a HashMap is just wrong.

Then there is the way you build the Collector. It could be simplified to:

customerList.stream().collect(Collectors.toMap(Customer::getId, Function.identity()));

The method reference Customer::getId, as opposed to the lambda expression, will create one less method (since lambda expressions are de-sugared to methods and method references are not).

Also Function.identity() instead of t -> t will create less objects if used in multiple places. Read this.

Then there is the fact how a HashMap works internally. If you don't specify a default size, it might have to re-size - which is an expensive operation. By default Collectors.toMap will start with a default Map of 16 entries and a load_factor of 0.75 - which means you can put 12 entries into it before the next resize.

You can't omit that using Collectors.toMap since the supplier of that will always start from HashMap::new - using the default 16 entries and load_factor of 0.75.

Knowing this, you can drop the stream entirely:

Map<String, Customer> map = new HashMap<String, Customer>((int)Math.ceil(customerList.size() / 0.75));
customerList.forEach(x -> map.put(x.getId(), x));

score 0 · Answer 3 · answered Aug 17 '17 at 06:35

In the second approach, you are instantiating a Map, and reassigning the reference to the one returned by the call to stream.collect(). Obviously, the first Map object referenced by "map" is lost.

The first approach does not have this problem.

In short, yes, this makes a minor difference in terms of memory usage, but it is likely negligible considering you have a million entries to iterate over.

Using temp variable using collectors

3 Answers3