4

Can somebody please help me in optimising the code below. I don't want to stream on same list 3 times. I have to iterate on same list and apply different mapping functions. Can somebody suggest any better solution for doing this-

List<Dummy> dummy = getDummyData(); //Assume we are getting data from some source
List<NewDummy> newDummyList = dummy.stream().map(eachDummy -> mapper.map(eachDummy, NewDummy.class)).collect(Collectors.toList());

if(someCondition) {
  final BigDecimal amount1 = dummy.stream().filter(eachDummy -> Optional.ofNullable(eachDummy.getAmount1()).isPresent())
                                  .map(Dummy::getAmount1).reduce(BigDecimal.ZERO, BigDecimal::add);
  final BigDecimal amount2 = dummy.stream().filter(eachDummy -> Optional.ofNullable(eachDummy.getAmount2()).isPresent())
                                  .map(Dummy::getAmount2).reduce(BigDecimal.ZERO, BigDecimal::add);

  return new DummyObject(newDummyList, amount1, amount2);
} else {
    return new DummyObject(newDummyList);
}
dubpp
  • 85
  • 1
  • 7

2 Answers2

6

This seems like the ideal use case for a custom collector. But before that, I think you can simplify the sums of the amounts as follows:

BigDecimal amount1 = dummy.stream()
    .map(Dummy::getAmount1)
    .filter(Objects::nonNull)
    .reduce(BigDecimal::add).orElse(BigDecimal.ZERO);

Now, the custom collector. You could accumulate instances of Dummy into an instance of a dedicated local class inside a static utility method:

static Collector<Dummy, ?, DummyObject> toDummyObject(
        Function<Dummy, NewDummy> mapper, 
        boolean someCondition) {

    class Accumulator {
        List<NewDummy> newDummyList = new ArrayList<>();
        BigDecimal amount1 = BigDecimal.ZERO;
        BigDecimal amount2 = BigDecimal.ZERO;

        public void add(Dummy dummy) {
            newDummyList.add(mapper.apply(dummy));
        }

        public void addAndSum(Dummy dummy) {
            if (dummy.getAmount1() != null) amount1 = amount1.add(dummy.getAmount1());
            if (dummy.getAmount2() != null) amount2 = amount2.add(dummy.getAmount2());
            add(dummy);
        }

        public Accumulator merge(Accumulator another) {
            newDummyList.addAll(another.newDummyList);
            return this;
        }

        public Accumulator mergeAndSum(Accumulator another) {
            amount1 = amount1.add(another.amount1);
            amount2 = amount2.add(another.amount2);
            return merge(another);
        }

        public DummyObject finish() {
            return someCondition ?
                new DummyObject(newDummyList, amount1, amount2) :
                new DummyObject(newDummyList);
        }
    }

    return Collector.of(
        Accumulator::new, 
        someCondition ? Accumulator::addAndSum : Accumulator::add,
        someCondition ? Accumulator::mergeAndSum : Accumulator::merge,
        Accumulator::finish);
}

Now we're ready to go:

dummy.stream().collect(toDummyObject(
    eachDummy -> mapper.map(eachDummy, NewDummy.class), 
    someCondition));
fps
  • 33,623
  • 8
  • 55
  • 110
  • @federico thanks for suggesting a great alternative. In terms of performance, dont you think use to Reduce to calculate the sum in original method will outperform the code you have suggested here. Basically, if i want to compare your suggested method vs the one in question, how should i do it? – dubpp Sep 06 '18 at 21:11
  • @prats thanks for your words. Search for jmh microbenchmark here, you'll find very useful info. I believe (theoretically) that my version will behave slightly better than yours, only just a bit. Both versions have the same time complexity. I think that it's OK to have 3 separate loops or streams, maybe you could refactor the sum of the amounts to a private method. – fps Sep 06 '18 at 21:47
  • After debugging the code, I realised that combiner (mergeAndSum or merge) will never be called since this is a sequential stream. So we might change it to parallel stream to take advantage of combiners – dubpp Sep 07 '18 at 14:01
  • @prats you can try that and measure, but I doubt you'll find any improvement – fps Sep 07 '18 at 14:17
4

I agree with Federico that a Collector seems the best choice here.

However, instead of implementing a very specialized Collector, I prefer to implement only some generic "building blocks", and then use those blocks to compose the Collector that I need in a given case.

Assuming:

interface Mapper<T> {
    T map(Dummy dummy, Class<T> type);
}

this is how the construction of DummyObject looks like when using my solution:

Collector<Dummy, ?, DummyObject> dummyObjectCollector = someCondition
        ? toDummyObjectWithSums(mapper)
        : toDummyObjectWithoutSums(mapper);
return dummy.stream().collect(dummyObjectCollector);

Here is how I compose the use-case-specific Collectors:

private static Collector<Dummy, ?, DummyObject> toDummyObjectWithoutSums(Mapper<NewDummy> mapper) {
    return Collectors.collectingAndThen(toNewDummyList(mapper), DummyObject::new);
}

private static Collector<Dummy, ?, List<NewDummy>> toNewDummyList(Mapper<NewDummy> mapper) {
    return Collectors.mapping(dummy -> mapper.map(dummy, NewDummy.class), Collectors.toList());
}

private static Collector<Dummy, ?, DummyObject> toDummyObjectWithSums(Mapper<NewDummy> mapper) {
    return ExtraCollectors.collectingBoth(
            toNewDummyList(mapper),
            sumGroupCollector(),
            (newDummyList, amountSumPair) -> new DummyObject(
                    newDummyList, amountSumPair.getAmountSum1(), amountSumPair.getAmountSum2()
            )
    );
}

private static Collector<Dummy, ?, AmountSumPair> sumGroupCollector() {
    return ExtraCollectors.collectingBoth(
            summingAmount(Dummy::getAmount1),
            summingAmount(Dummy::getAmount2),
            AmountSumPair::new
    );
}

static Collector<Dummy, ?, BigDecimal> summingAmount(Function<Dummy, BigDecimal> getter) {
    return Collectors.mapping(getter,
            ExtraCollectors.filtering(Objects::nonNull,
                    ExtraCollectors.summingBigDecimal()
            )
    );
}

private static class AmountSumPair {
    private final BigDecimal amountSum1;
    private final BigDecimal amountSum2;

    // constructor + getters
}

Finally, we come to the generic "building blocks" (which I placed inside ExtraCollectors class):

  • summingBigDecimal: quite obvious
  • filtering: also rather obvious (corresponds to Stream.filter)
  • collectingBoth: this is the most interesting one:
    1. it takes two Collectors (both operating on T but returning different results, i.e. Collector<T, ?, R1> and Collector<T, ?, R2>)
    2. and it combines them using a BiFunction<R1, R2, R> into a single Collector<T, ?, R>

Here is the ExtraCollectors class:

final class ExtraCollectors {

    static Collector<BigDecimal, ?, BigDecimal> summingBigDecimal() {
        return Collectors.reducing(BigDecimal.ZERO, BigDecimal::add);
    }

    static <T, A, R> Collector<T, A, R> filtering(
            Predicate<T> filter, Collector<T, A, R> downstream) {
        return Collector.of(
                downstream.supplier(),
                (A acc, T t) -> {
                    if (filter.test(t)) {
                        downstream.accumulator().accept(acc, t);
                    }
                },
                downstream.combiner(),
                downstream.finisher(),
                downstream.characteristics().toArray(new Collector.Characteristics[0])
        );
    }

    static <T, R1, R2, R> Collector<T, ?, R> collectingBoth(
            Collector<T, ?, R1> collector1, Collector<T, ?, R2> collector2, BiFunction<R1, R2, R> biFinisher) {
        return collectingBoth(new BiCollectorHandler<>(collector1, collector2), biFinisher);
    }

    // method needed to capture A1 and A2
    private static <T, A1, R1, A2, R2, R> Collector<T, ?, R> collectingBoth(
            BiCollectorHandler<T, A1, R1, A2, R2> biCollectorHandler, BiFunction<R1, R2, R> biFinisher) {
        return Collector.<T, BiCollectorHandler<T, A1, R1, A2, R2>.BiAccumulator, R>of(
                biCollectorHandler::newBiAccumulator,
                BiCollectorHandler.BiAccumulator::accept,
                BiCollectorHandler.BiAccumulator::combine,
                biAccumulator -> biAccumulator.finish(biFinisher)
        );
    }
}

And here is the BiCollectorHandler class (used internally by ExtraCollectors.collectingBoth):

final class BiCollectorHandler<T, A1, R1, A2, R2> {

    private final Collector<T, A1, R1> collector1;
    private final Collector<T, A2, R2> collector2;

    BiCollectorHandler(Collector<T, A1, R1> collector1, Collector<T, A2, R2> collector2) {
        this.collector1 = collector1;
        this.collector2 = collector2;
    }

    BiAccumulator newBiAccumulator() {
        return new BiAccumulator(collector1.supplier().get(), collector2.supplier().get());
    }

    final class BiAccumulator {

        final A1 acc1;
        final A2 acc2;

        private BiAccumulator(A1 acc1, A2 acc2) {
            this.acc1 = acc1;
            this.acc2 = acc2;
        }

        void accept(T t) {
            collector1.accumulator().accept(acc1, t);
            collector2.accumulator().accept(acc2, t);
        }

        BiAccumulator combine(BiAccumulator other) {
            A1 combined1 = collector1.combiner().apply(acc1, other.acc1);
            A2 combined2 = collector2.combiner().apply(acc2, other.acc2);
            return new BiAccumulator(combined1, combined2);
        }

        <R> R finish(BiFunction<R1, R2, R> biFinisher) {
            R1 result1 = collector1.finisher().apply(acc1);
            R2 result2 = collector2.finisher().apply(acc2);
            return biFinisher.apply(result1, result2);
        }
    }
}
Tomasz Linkowski
  • 4,386
  • 23
  • 38
  • 3
    +1 for the work (I bet it took quite some time to write) and btw `BiCollector` might be coming to jdk-12. – Eugene Sep 07 '18 at 12:42
  • 2
    @Eugene Ha, [Tagir Valeev came up with this](http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-June/053800.html) before me ;) Good to know, thanks for the information! I hope this feature makes it to the JDK 12. And yes, it took a while to write - a bit longer than I expected, but this is the problem with SO: solving problems like the one here gets addictive once you start ;) – Tomasz Linkowski Sep 07 '18 at 12:52
  • This is a great work, Tomasz, and I agree in that having these building blocks is a much better approach. I also liked your initial anser very much, I think that it showed the other side of the coin, i.e. `you don't need a specialized collector for this, just refactor the sums of the amounts into a private method and you will be OK streaming 3 times over the list`. – fps Sep 07 '18 at 15:28
  • 2
    @FedericoPeraltaSchaffner Thank you! :) About my initial answer: it was actually wrong because I misunderstood the question and was streaming `List` only once and `List` twice. Of course, triple streaming of `List` might also be an option, but since the OP wrote "assume we are getting data from some source", I assume such streaming is to be considered expensive. – Tomasz Linkowski Sep 07 '18 at 15:48
  • *Update*: the "BiCollector" functionality is coming to JDK12 as `Collectors.teeing`: [JDK-8209685](https://bugs.openjdk.java.net/browse/JDK-8209685). – Tomasz Linkowski Nov 12 '18 at 06:42