Group sequences of values

Question

I'm wondering if there's in any nifty way to use the new Stream APIs to "group" sequences of values.

e.g. split a series of integers, into groups of integers where each group is an ascending number sequence:

IntStream seq = IntStream.of(1, 2, 3, -1, -1, 1, 2, 1, 2);
IntFunction next = i -> i + 1;

// DESIRED OUTPUT: [[1,2,3], [-1], [-1], [1,2], [1,2]]

Shouldn't the output look like this: `[[1,2,3], [-1], [-1,1,2], [1,2]]`? — Flown, Feb 05 '16 at 21:58

score 8 · Accepted Answer · answered Feb 05 '16 at 21:59

Unfortunately, the Stream API is not very well suited to tackle problems that involve dependant operations on the Stream element, like this one.

However, you can use the StreamEx library for this:

public static void main(String[] args) {
    IntStream seq = IntStream.of(1, 2, 3, -1, -1, 1, 2, 1, 2);
    IntUnaryOperator next = i -> i + 1;

    List<List<Integer>> result = 
        IntStreamEx.of(seq).boxed().groupRuns((i1, i2) -> next.applyAsInt(i1) == i2).toList();

    System.out.println(result); // prints "[[1, 2, 3], [-1], [-1], [1, 2], [1, 2]]"
}

This groups into a List all consecutive integers where the second one is equal to the next function applied to the first one. Finally, this Stream is collected into a List.

Not so elegant as your idea, but it really can be done with pure Java-8 streams. — Andremoniy, Feb 05 '16 at 22:47
Thanks! Looks like StreamEx will save me a lot of headaches! — rednoah, Feb 06 '16 at 06:37

score 7 · Answer 2 · edited May 23 '17 at 12:26

If you're willing to operate on an in-memory data structure, such as an array or list, it's possible to do this in standard Java 8 in just a couple steps. This can be done using array programming techniques such as illustrated in my answer to this question. Using some clever conditionals similar to that used in Flown's answer to this question takes care of the edge cases in a neat way.

The key insight is to realize that a new segment (or group) begins at every point where the desired predicate is not met. That is, a new segment begins is where seq[i-1] + 1 != seq[i]. Let's run an IntStream over the input and filter the indexes for this property and store the result in some array x:

    int[] seq = { 1, 2, 3, -1, -1, 1, 2, 1, 2 };
    int[] x = IntStream.range(1, seq.length)
                       .filter(i -> seq[i-1] + 1 != seq[i])
                       .toArray();

resulting in

    [3, 4, 5, 7]

This only gives us the interior boundaries of the segments. To get the starts and ends of the segments, we need to tack on the start of the first segment and the end of the last segment. We adjust the index range and add some conditionals to the filter:

    int[] x = IntStream.rangeClosed(0, seq.length)
                       .filter(i -> i == 0 || i == seq.length ||
                                    seq[i-1] + 1 != seq[i])
                       .toArray();

    [0, 3, 4, 5, 7, 9]

Now every adjacent pair of indexes is a subrange of the original array. We can use another stream to extract those subranges, giving the desired result:

    int[][] result =
        IntStream.range(0, x.length - 1)
                 .mapToObj(i -> Arrays.copyOfRange(seq, x[i], x[i+1]))
                 .toArray(int[][]::new);

    [[1, 2, 3], [-1], [-1], [1, 2], [1, 2]]

This can be extracted into a function that itself takes a "next" function that computes the next value in the segment. That is, for any element, if the element to its right matches the result of the next-function, the elements are in the same segment; otherwise it's a segment boundary. Here's the code:

int[][] segments(int[] seq, IntUnaryOperator next) {
    int[] x = IntStream.rangeClosed(0, seq.length)
                       .filter(i -> i == 0 || i == seq.length ||
                               next.applyAsInt(seq[i-1]) != seq[i])
                       .toArray();

    return  IntStream.range(0, x.length - 1)
                     .mapToObj(i -> Arrays.copyOfRange(seq, x[i], x[i+1]))
                     .toArray(int[][]::new);
}

You'd call it like this:

    int[] seq = { 1, 2, 3, -1, -1, 1, 2, 1, 2 };
    System.out.println(Arrays.deepToString(segments(seq, i -> i + 1)));

    [[1, 2, 3], [-1], [-1], [1, 2], [1, 2]]

Changing the next-function allows splitting the segments in a different way. For example, to split an array into segments of equal values, you'd do this:

    int[] seq = { 2, 2, 1, 3, 3, 1, 1, 1, 4, 4, 4 };
    System.out.println(Arrays.deepToString(segments(seq, i -> i)));

    [[2, 2], [1], [3, 3], [1, 1, 1], [4, 4, 4]]

The difficulty with using a next-function like this is that the condition for values belonging to a segment is limited. It would be nicer provide a predicate that compares to adjacent values to test if they're in the same segment. We can do that using a BiPredicate<Integer, Integer> if we're willing to pay the cost of boxing:

int[][] segments(int[] input, BiPredicate<Integer, Integer> pred) {
    int[] x = IntStream.rangeClosed(0, input.length)
                       .filter(i -> i == 0 || i == input.length ||
                               !pred.test(input[i-1], input[i]))
                       .toArray();

    return  IntStream.range(0, x.length - 1)
                     .mapToObj(i -> Arrays.copyOfRange(input, x[i], x[i+1]))
                     .toArray(int[][]::new);
}

This allows gathering segments using a different criterion, for example, monotonically increasing segments:

    int[] seq = { 3, 1, 4, 1, 5, 9, 2, 6, 5, 3 };
    System.out.println(Arrays.deepToString(segments(seq, (a, b) -> b > a)));

    [[3], [1, 4], [1, 5, 9], [2, 6], [5], [3]]

This could be specialized to use a primitive bi-predicate over two int values, or it could be generalized to allow using a BiPredicate of any type over input of any type.

Andremoniy · Answer 3 · 2023-01-13T18:12:30.290

2

Not so elegant as @Tunaki's solution, but using "pure" Java-8 streams:

IntStream seq = IntStream.of(1, 2, 3, -1, -1, 1, 2, 1, 2);

Deque<Deque<Integer>> r = new ArrayDeque<>(singleton(new ArrayDeque<>()));

seq.filter(i -> !r.getLast().isEmpty() && r.getLast().getLast() + 1 != i || !r.getLast().add(i))
            .forEach(i -> r.add(new ArrayDeque<>(singleton(i))));

System.out.println(r); // prints: [[1, 2, 3], [-1], [-1], [1, 2], [1, 2]]

Here just for elegance of code I use Deque class in order to use getLast() method (for List it will be not so compact).

edited Jan 13 '23 at 18:12

answered Feb 05 '16 at 22:46

Andremoniy

34,031
20
135
241

2

Should be noted that such solution abuses the API (in particular, `Predicate` passed to `.filter` must be stateless according to [spec](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#filter-java.util.function.Predicate-)). As a consequence this solution cannot be parallelized. – Tagir Valeev Feb 06 '16 at 03:30
@TagirValeev Can be Tanuki's one be parallelized? – Andremoniy Feb 06 '16 at 06:20
1

Yeah, and you'll actually likely to have speedup on large input. Every StreamEx feature handles parallelization correctly and most of them actually benefit from parallelization. – Tagir Valeev Feb 06 '16 at 07:16
1

[Here](https://gist.github.com/amaembo/b12599b8e61d805dcc26)'s the gist with benchmark and results on 4-core machine. Well, not very impressive speedup, but still significant. Serial version works roughly with the same speed as yours. – Tagir Valeev Feb 06 '16 at 07:38

Group sequences of values

3 Answers3

Linked