0

First of all, These classes are mock data classes. I have list of phones and i have to create a new list that has phone types only. For that, I have two options, If anyone can explain which option is better than the other one with the proper reason.

Class Phone{
      int id;
      String type;
      Company company;
     //Assume all getters setters and constructor is there

 }
Class PhoneType{
      int id;
      String type;
     //Assume all getters setters and constructor is there
 }

and I have an list of Phone objects as below.

List<Phone> phones = new ArrayList<>();
phones.add(new Phone(1,"J7",samsung));
phones.add(new Phone(2,"J5",samsung));
phones.add(new Phone(3,"I5",apple));

I have to get a list of phone types from the list of phones. For that, I have two ways as below first way

List<PhoneType> phoneTypes;
phoneTypes = phones.parallelStream().
map(phone ->new PhoneType(phone.getId(),phone.getType())).
collect(Collectors.toList());

Second way

List<PhoneType> phoneTypes = new ArrayList<>();
phones.parallelStream().forEach(
phone -> phoneTypes.add(new PhoneType(phone.getId(), phone.getType())));

If you have any better way than above mentioned ways, refer them as well

  • In your first apprach you're using map perhaps not the way it is intended. while you can stuff all kind of side effects into it, for the side-effect thing you have forEach anyway. But see the answer below. – Curiosa Globunznik Oct 11 '19 at 05:44
  • Phone could have a PhoneType member rather than duplicating the same fields. – swpalmer Oct 11 '19 at 18:18
  • @swpalmer that was mock data –  Oct 12 '19 at 06:02

1 Answers1

0

I'd say, this is the canonical approach. No separate needs to be created, further operations like filter can be added. Still that's a matter of taste, I guess.

List<PhoneType> phoneTypes = phones.stream()
  .map(p -> new PhoneType(p.getId(), p.getType())))
  .collect(Collectors.toList());

There's a number of possibilities listed below with empirical test results gathered on an 8 core/8Gb machine, 4Gb JVM heap size. If there's a certain takeaway from that, it's, that differences don't range in orders of magnitude, with collection.foreach perhaps coming out somehow ahead of everything else. Find another similar discussion here, measuring further variations including iterators and arriving at similar conclusions.

As for the different approaches: stream.forEach() doesn't preserve the order of the input, everything else does, here's a detailed discussion of stream vs collections.

The first way mentioned above disqualifies because of parallelStream, it proves to be consistently slower in the case of a simple map operation.

The second, because collection.forEach does the same job, preserves the order and is faster too. At this occasion, without tapping into heavily mined ideological terrain, I'd like to hint at a discussion whether premature optimization is the root of all evil or not.

An intermediate version would be collection.forEach instead of stream().forEach, still I find it less appealing to look at than stream().map, some performance gain seems to be there, is not huge though, which has to we weighed against aesthetical drawbacks. A matter of taste perhaps.

One could choose to operate with for-loops and arrays only, circumventing collections and streams altogether. Drawback is, they come in fixed length only and look messy compared to the slick functional versions.

I made a series of test runs, find the results below, where surprisingly collection.foreach came out ahead, followed by the loop versions and the stream versions last, with parallelStream running slowest. The test results are blurred by factors I can't precisely pin down, garbage collection being the main suspect.

You can find code and data producing these results in a gist.

I did 10 runs, in each one running mappings for list/array sizes of 100000,1000000,10000000,20000000 and 30000000.

Before each run a warmup run with 30000000 which was not measured. Still there were random outliers scattered all over the place (kind of 17 seconds at an average of 1.5), I'd blame it on the garbage collector. For the final calculation of the averges I replaced the outliers with the average of the column (including the outlier)

loop:array→array    1.8
loop:array→list     0.983
collection.foreach  0.8
stream.foreach      1.3
mapSeq              1.8
mapParallel         2.1
swpalmer
  • 3,890
  • 2
  • 23
  • 31
Curiosa Globunznik
  • 3,129
  • 1
  • 16
  • 24
  • I put some mock data there, In the real case, I will use the big size of arrays. For the performance, I think, I have to go with the parallel stream. Isn't it? –  Oct 11 '19 at 05:44
  • 1
    I'd say for a simple map operation like that the overhead of parallelStream will outweight any benefits of having more threads processing sections of the list. It would be different, if for every element heavy computation would be done, http requests fetched or the like. That claim could even hold true if your list fills half of the memory of your machine. But just make an experiment, I'd propose. – Curiosa Globunznik Oct 11 '19 at 05:47
  • Add a constructor to PhoneType: so you can simplify the map operation to be a method reference to the constructor: List phoneTypes = phones.stream() .map(PhoneType::new))) .collect(Collectors.toList()); – swpalmer Oct 11 '19 at 18:17
  • 1
    also, instead of constructing new `PhoneType` objects, I'd probably just utilize a `PhoneType` passed into the constructor of `Phone` itself. Then the list can be calculated with: `phones.stream().map(Phone::getType).collect(Collectors.toList());`, whilst also allowing for use of `Stream#distinct` and not creating a bunch of new objects for every phone – Rogue Oct 14 '19 at 02:21