10

I want to determine the minimum area required to display a collection of points. The easy way is to loop through the collection like this:

int minX = Integer.MAX_VALUE;
int maxX = Integer.MIN_VALUE;
int minY = Integer.MAX_VALUE;
int maxY = Integer.MIN_VALUE;
for (Point point: points) {
    if (point.x < minX) {
        minX = point.x;
    }
    if (point.x > maxX) {
        maxX = point.x;
    }
    if (point.y < minY) {
        minY = point.y;
    }
    if (point.y > maxY) {
        maxY = point.y;
    }
}

I am getting to know streams. To do the same, you can do the following:

int minX = points.stream().mapToInt(point -> point.x).min().orElse(-1);
int maxX = points.stream().mapToInt(point -> point.x).max().orElse(-1);
int minY = points.stream().mapToInt(point -> point.y).min().orElse(-1);
int maxY = points.stream().mapToInt(point -> point.y).max().orElse(-1);

Both give the same result. However, although the streams approach is elegant, it is much slower (as expected).

Is there a way to get minX, maxX, minY and maxY in a single stream operation?

Andrew Tobilko
  • 48,120
  • 14
  • 91
  • 142
MWB
  • 1,830
  • 1
  • 17
  • 38
  • 1
    Create a `Collector` with a custom type that describes all the positions, not much better than your iterative approach imo (extracted into its own method for example). – Sotirios Delimanolis Jan 03 '19 at 20:34
  • 1
    @AkinerAlkan: `maxX` is actually set to `Integer.MIN_VALUE`. That means that the condition `if (point.x > maxX)` will definitely become `true` (unless all x-values are equal to `Integer.MIN_VALUE`, which is highly unlikely). – MWB Jan 03 '19 at 20:45

5 Answers5

12

JDK 12 and above has Collectors.teeing (webrev and CSR), which collects to two different collectors and then merges both partial results into a final result.

You could use it here to collect to two IntSummaryStatistics for both the x coordinate and the y coordinate:

List<IntSummaryStatistics> stats = points.stream()
    .collect(Collectors.teeing(
             Collectors.mapping(p -> p.x, Collectors.summarizingInt()),
             Collectors.mapping(p -> p.y, Collectors.summarizingInt()),
             List::of));

int minX = stats.get(0).getMin();
int maxX = stats.get(0).getMax();
int minY = stats.get(1).getMin();
int maxY = stats.get(1).getMax();

Here the first collector gathers statistics for x and the second one for y. Then, statistics for both x and y are merged into a List by means of the JDK 9 List.of factory method that accepts two elements.

An aternative to List::of for the merger would be:

(xStats, yStats) -> Arrays.asList(xStats, yStats)

If you happen to not have JDK 12 installed on your machine, here's a simplified, generic version of the teeing method that you can safely use as a utility method:

public static <T, A1, A2, R1, R2, R> Collector<T, ?, R> teeing(
        Collector<? super T, A1, R1> downstream1,
        Collector<? super T, A2, R2> downstream2,
        BiFunction<? super R1, ? super R2, R> merger) {

    class Acc {
        A1 acc1 = downstream1.supplier().get();
        A2 acc2 = downstream2.supplier().get();

        void accumulate(T t) {
            downstream1.accumulator().accept(acc1, t);
            downstream2.accumulator().accept(acc2, t);
        }

        Acc combine(Acc other) {
            acc1 = downstream1.combiner().apply(acc1, other.acc1);
            acc2 = downstream2.combiner().apply(acc2, other.acc2);
            return this;
        }

        R applyMerger() {
            R1 r1 = downstream1.finisher().apply(acc1);
            R2 r2 = downstream2.finisher().apply(acc2);
            return merger.apply(r1, r2);
        }
    }

    return Collector.of(Acc::new, Acc::accumulate, Acc::combine, Acc::applyMerger);
}

Please note that I'm not considering the characteristics of the downstream collectors when creating the returned collector.

Naman
  • 27,789
  • 26
  • 218
  • 353
fps
  • 33,623
  • 8
  • 55
  • 110
  • 1
    Thank you for the answer. These backport implementations are very good to use while we are unable to upgrade. :) – Naman May 23 '21 at 03:37
  • @Naman Still on Java 8 and we'll go "modern" soon... to Java 11 (And jdk 17 will be released on September) – fps May 23 '21 at 07:34
9

You could divide by two the iterations with summaryStatistics() while keeping a straight code :

IntSummaryStatistics stat = points.stream().mapToInt(point -> point.x).summaryStatistics();
int minX = stat.getMin();
int maxX = stat.getMax();

And do the same thing with point.y.
You could factor out in this way :

Function<ToIntFunction<Point>, IntSummaryStatistics> statFunction =
        intExtractor -> points.stream()
                              .mapToInt(p -> intExtractor.applyAsInt(pp))
                              .summaryStatistics();

IntSummaryStatistics statX = statFunction.apply(p -> p.x);
IntSummaryStatistics statY = statFunction.apply(p -> p.y);

A custom collector is a possibility but note that you should implement the combiner part that will make your code harder to read.
So but if you need to use parallel stream you should stay to the imperative way.
While you could improve your actual code by relying the Math.min and Math.max functions :

for (Point p : points) {
    minX = Math.min(p.x, minX);
    minY = Math.min(p.y, minY);
    maxY = Math.max(p.x, maxX);
    maxY = Math.max(p.y, maxY);
}
davidxxx
  • 125,838
  • 23
  • 214
  • 215
  • Alternatively, instead of creating a `Function` object you could simply extract a method : `public static IntSummaryStatistics getSummaryStatistics(Collection points, ToIntFunction getter) { return points.stream().mapToInt(getter).summaryStatistics(); }` – Ricola Jan 03 '19 at 22:03
  • @Ricola Of course. It is a possibility sometimes very fine. But in some other cases, it happens that many methods of your class rely on extracting functions. Defining a method (even private) for each one could make the code reading harder by demultiplying the number of methods. – davidxxx Jan 04 '19 at 09:05
  • I understand, but I would say that if extracting methods in your class makes your class too big or hard to understand, then maybe the issue it that your class does too much and should be split into smaller components. – Ricola Jan 04 '19 at 09:09
  • @Ricola That may be the reason indeed. But sometimes it is not the issue. Suppose you have "just" 5 public methods that needs a specific parameterizable function. By adding 5 private methods that replace the local variable functions, you increase their scope. As a consequence it create reading indirection in the code but note that it also may create side effects such as invoking the "wrong" method. That's why If the function as a local variable is already straight readable I wonder on the relevance to extract it into a method. After it is of course subjective. – davidxxx Jan 04 '19 at 09:16
9

By analogy with IntSummaryStatistics, create a class PointStatistics which collects the information you need. It defines two methods: one for recording values from a Point, one for combining two Statistics.

class PointStatistics {
    private int minX = Integer.MAX_VALUE;
    private int maxX = Integer.MIN_VALUE;

    private int minY = Integer.MAX_VALUE;
    private int maxY = Integer.MIN_VALUE;

    public void accept(Point p) {
        minX = Math.min(minX, p.x);
        maxX = Math.max(maxX, p.x);

        minY = Math.min(minY, p.y);
        maxY = Math.max(minY, p.y);
    }

    public void combine(PointStatistics o) {
        minX = Math.min(minX, o.minX);
        maxX = Math.max(maxX, o.maxX);

        minY = Math.min(minY, o.minY);
        maxY = Math.max(maxY, o.maxY);
    }

    // getters
}

Then you can collect a Stream<Point> into a PointStatistics.

class Program {
    public static void main(String[] args) {
        List<Point> points = new ArrayList<>();

        // populate 'points'

        PointStatistics statistics = points
                    .stream()
                    .collect(PointStatistics::new, PointStatistics::accept, PointStatistics::combine);
    }
}

UPDATE

I was completely baffled by the conclusion drawn by OP, so I decided to write JMH benchmarks.

Benchmark settings:

# JMH version: 1.21
# VM version: JDK 1.8.0_171, Java HotSpot(TM) 64-Bit Server VM, 25.171-b11
# Warmup: 1 iterations, 10 s each
# Measurement: 10 iterations, 10 s each
# Timeout: 10 min per iteration
# Benchmark mode: Average time, time/op

For each iteration, I was generating a shared list of random Points (new Point(random.nextInt(), random.nextInt())) of size 100K, 1M, 10M.

The results are

100K

Benchmark                        Mode  Cnt  Score   Error  Units

customCollector                  avgt   10  6.760 ± 0.789  ms/op
forEach                          avgt   10  0.255 ± 0.033  ms/op
fourStreams                      avgt   10  5.115 ± 1.149  ms/op
statistics                       avgt   10  0.887 ± 0.114  ms/op
twoStreams                       avgt   10  2.869 ± 0.567  ms/op

1M

Benchmark                        Mode  Cnt   Score   Error  Units

customCollector                  avgt   10  68.117 ± 4.822  ms/op
forEach                          avgt   10   3.939 ± 0.559  ms/op
fourStreams                      avgt   10  57.800 ± 4.817  ms/op
statistics                       avgt   10   9.904 ± 1.048  ms/op
twoStreams                       avgt   10  32.303 ± 2.498  ms/op

10M

Benchmark                        Mode  Cnt    Score     Error  Units

customCollector                  avgt   10  714.016 ± 151.558  ms/op
forEach                          avgt   10   54.334 ±   9.820  ms/op
fourStreams                      avgt   10  699.599 ± 138.332  ms/op
statistics                       avgt   10  148.649 ±  26.248  ms/op
twoStreams                       avgt   10  429.050 ±  72.879  ms/op
Andrew Tobilko
  • 48,120
  • 14
  • 91
  • 142
2

Thanks everyone for all the suggestions and answers. This is very helpful and I learned a lot!

I decided to give most of your solutions a go (except for the JDK12 solution). For some of them you already provide me with the code. In addition, I made my own Collector.

class extremesCollector implements Collector<Point, Map<String, Integer>, Map<String , Integer>> {

    @Override
    public Supplier<Map<String, Integer>> supplier() {
        Map<String, Integer> map = new HashMap<>();
        map.put("xMin", Integer.MAX_VALUE);
        map.put("yMin", Integer.MAX_VALUE);
        map.put("xMax", Integer.MIN_VALUE);
        map.put("yMax", Integer.MIN_VALUE);
        return () -> map;
    }

    @Override
    public BiConsumer<Map<String, Integer>, Point> accumulator() {
        return (a, b) -> {
            a.put("xMin", Math.min(a.get("xMin"), b.x));
            a.put("yMin", Math.min(a.get("yMin"), b.y));
            a.put("xMax", Math.max(a.get("xMax"), b.x));
            a.put("yMax", Math.max(a.get("yMax"), b.y));
        };
    }

    @Override
    public Function<Map<String, Integer>, Map<String, Integer>> finisher() {
        return Function.identity();
    }

    @Override
    public BinaryOperator<Map<String, Integer>> combiner() {
        return (a, b) -> {
            a.put("xMin", Math.min(a.get("xMin"), b.get("xMin")));
            a.put("yMin", Math.min(a.get("yMin"), b.get("yMin")));
            a.put("xMax", Math.max(a.get("xMax"), b.get("xMax")));
            a.put("yMax", Math.max(a.get("yMax"), b.get("yMax")));
            return a;
        };
    }

    @Override
    public Set<Characteristics> characteristics() {
        Set<Characteristics> characteristics = new HashSet<>();
        characteristics.add(Characteristics.UNORDERED);
        characteristics.add(Characteristics.CONCURRENT);
        characteristics.add(Characteristics.IDENTITY_FINISH);
        return characteristics;
    }
}

Results

I gave all of them a try and compared the results. Good news: for all of them, I got the same result as far as the values are concerned!

With respect to the speed, here is the ranking:

  1. for-loops
  2. four individual streams
  3. stream with self-made Collector
  4. parallel stream with self-made Collector
  5. statistics approach provided by Andrew Tobilko

Number 2 and 3 are actually very close in terms of speed. The parallel version is probably slower because my dataset is too small.

MWB
  • 1,830
  • 1
  • 17
  • 38
  • Sorry, I did the quick and dirty way. I measured the time (System.nanotime) before and after running the code fragment 50 times and took the average. Did this for each approach. – MWB Jan 04 '19 at 11:20
  • 2
    If speed is your concern, I'd rather use an array of 4 elements or a custom object instead of a HashMap. – Ricola Jan 04 '19 at 16:20
1

You are able to use 2 Streams using Stream::reduce to get a point with a minimum and a point with a maximum. I don't recommend to concatenate the results to a one stream since there might be hard to distinguish the difference between the minimum, maximum and the coordinates.

Point min = points
    .stream()
    .reduce((l, r) -> new Point(Math.min(l.y, r.y), Math.min(l.y, r.y))
    .orElse(new Point(-1, -1));

Point max = points
    .stream()
    .reduce((l, r) -> new Point(Math.max(l.y, r.y), Math.max(l.y, r.y))
    .orElse(new Point(-1, -1));

As the BinaryOperator<Point> use the two subsequent Points and a ternary operator to find out the minimum/maximum which is passed to a new object Point and returned using Optional::orElse with the default -1, -1 coordinates.

Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183