6

I want to take a stream of strings and turn it into a stream of word pairs. eg:

I have: { "A", "Apple", "B", "Banana", "C", "Carrot" }

I want: { ("A", "Apple"), ("Apple", "B"), ("B", "Banana"), ("Banana", "C") }.

This is nearly the same as Zipping, as outlined at Zipping streams using JDK8 with lambda (java.util.stream.Streams.zip)

However, that produces: { (A, Apple), (B, Banana), (C, Carrot) }

The following code works, but is clearly the wrong way to do it (not thread safe etc etc):

static String buffered = null;

static void output(String s) {
    String result = null;
    if (buffered != null) {
        result = buffered + "," + s;
    } else {
        result = null;
    }

    buffered = s;
    System.out.println(result);
}

// ***** 

Stream<String> testing = Stream.of("A", "Apple", "B", "Banana", "C", "Carrot");
testing.forEach(s -> {output(s);});
Community
  • 1
  • 1
Nick Lothian
  • 1,427
  • 1
  • 15
  • 31
  • 1
    I would go something like `zip(stream, stream[1:])` (that's python, but the idea is the same: create a second stream that skips the first item `stream.skip(1)` should do) – njzk2 Aug 23 '15 at 02:02
  • Do you want your word pairs to be a `String` separated by comma, or a tuple-like object? – durron597 Aug 23 '15 at 02:07
  • A tuple like object would be nice, but either will do. – Nick Lothian Aug 23 '15 at 02:08

4 Answers4

4

If you:

  1. Don't like the idea of creating a list with all strings from your stream
  2. Don't want to use external libraries
  3. Like to get your hands dirty

Then you can create a method to group elements from a stream using Java 8 low-level stream builders StreamSupport and Spliterator:

class StreamUtils {
    public static<T> Stream<List<T>> sliding(int size, Stream<T> stream) {
        return sliding(size, 1, stream);
    }

    public static<T> Stream<List<T>> sliding(int size, int step, Stream<T> stream) {
        Spliterator<T> spliterator = stream.spliterator();
        long estimateSize;

        if (!spliterator.hasCharacteristics(Spliterator.SIZED)) {
            estimateSize = Long.MAX_VALUE;
        } else if (size > spliterator.estimateSize()) {
            estimateSize = 0;
        } else {
            estimateSize = (spliterator.estimateSize() - size) / step + 1;
        }

        return StreamSupport.stream(
                new Spliterators.AbstractSpliterator<List<T>>(estimateSize, spliterator.characteristics()) {
                    List<T> buffer = new ArrayList<>(size);

                    @Override
                    public boolean tryAdvance(Consumer<? super List<T>> consumer) {
                        while (buffer.size() < size && spliterator.tryAdvance(buffer::add)) {
                            // Nothing to do
                        }

                        if (buffer.size() == size) {
                            List<T> keep = new ArrayList<>(buffer.subList(step, size));
                            consumer.accept(buffer);
                            buffer = keep;
                            return true;
                        }
                        return false;
                    }
                }, stream.isParallel());
    }
}

Methods and parameters naming was inspired in their Scala counterparts.

Let's test it:

Stream<String> testing = Stream.of("A", "Apple", "B", "Banana", "C", "Carrot");
System.out.println(StreamUtils.sliding(2, testing).collect(Collectors.toList()));

[[A, Apple], [Apple, B], [B, Banana], [Banana, C], [C, Carrot]]

What about not repeating elements:

Stream<String> testing = Stream.of("A", "Apple", "B", "Banana", "C", "Carrot");
System.out.println(StreamUtils.sliding(2, 2, testing).collect(Collectors.toList()));

[[A, Apple], [B, Banana], [C, Carrot]]

And now with an infinite Stream:

StreamUtils.sliding(5, Stream.iterate(0, n -> n + 1))
        .limit(5)
        .forEach(System.out::println);

[0, 1, 2, 3, 4]
[1, 2, 3, 4, 5]
[2, 3, 4, 5, 6]
[3, 4, 5, 6, 7]
[4, 5, 6, 7, 8]

Helder Pereira
  • 5,522
  • 2
  • 35
  • 52
3

This should do what you want, based on @njzk2's comment of using the stream twice, skipping the first element in the second case. It uses the zip method that you link in your original question.

public static void main(String[] args) {
  List<String> input = Arrays.asList("A", "Apple", "B", "Banana", "C", "Carrot");
  List<List<String>> paired = zip(input.stream(),
                                  input.stream().skip(1),
                                  (a, b) -> Arrays.asList(a, b))
                              .collect(ArrayList::new, ArrayList::add, ArrayList::addAll);
  System.out.println(paired);
}

This outputs a List<List<String>> with contents:

[[A, Apple], [Apple, B], [B, Banana], [Banana, C], [C, Carrot]]

In the comments, you asked how to do this if you already have a Stream. Unfortunately, it's difficult, because Streams are not stateful, and there isn't really a concept of the "adjacent" element in the Stream. There is a good discussion on this here.

I can think of two ways to do it, but I don't think you're going to like either of them:

  1. Convert the Stream to a List, and then do my solution above. Ugly, but works as long as the Stream isn't infinite and performance doesn't matter very much.
  2. Use @TagirValeev's answer below, as long as you are using a StreamEx and not a Stream, and willing to add a dependency on a third party library.

Also relevant to this discussion is this question here: Can I duplicate a Stream in Java 8?; it's not good news for your problem, but is worth reading and may have a solution that's more appealing to you.

Community
  • 1
  • 1
durron597
  • 31,968
  • 17
  • 99
  • 158
  • Ok, I see how this works and it makes sense to me now. Is there a way to run it from a single existing stream rather than two copies of the same stream? – Nick Lothian Aug 23 '15 at 04:49
  • @NickLothian Edited in response – durron597 Aug 23 '15 at 05:33
  • Thanks for the edit. I understand the complications of doing it just using streams, but I think there must be a solution. The key is keeping state, and I don't know how to do it. However, things like averaging and counting on streams do it so there must be a way. – Nick Lothian Aug 23 '15 at 10:27
  • @NickLothian both averaging and counting are a reduction (`.reduce`), which would not work here obviously. – durron597 Aug 23 '15 at 12:00
2

You can use my StreamEx library which enhances standard Stream API. There is a method pairMap which does exactly what you need:

StreamEx.of("A", "Apple", "B", "Banana", "C", "Carrot")
        .pairMap((a, b) -> a+","+b)
        .forEach(System.out::println);

Output:

A,Apple
Apple,B
B,Banana
Banana,C
C,Carrot

The pairMap argument is the function which converts the pair of adjacent elements to something which is suitable to your needs. If you have a Pair class in your project, you can use .pairMap(Pair::new) to get the stream of pairs. If you want to create a stream of two-element lists, you can use:

List<List<String>> list = StreamEx.of("A", "Apple", "B", "Banana", "C", "Carrot")
                                    .pairMap((a, b) -> StreamEx.of(a, b).toList())
                                    .toList();
System.out.println(list); // [[A, Apple], [Apple, B], [B, Banana], [Banana, C], [C, Carrot]]

This works with any element source (you can use StreamEx.of(collection), StreamEx.of(stream) and so on), correctly works if you have more stream operations before pairMap and very friendly to parallel processing (unlike solutions which involve stream zipping).

In case if your input is a List with fast random access and you actually want List<List<String>> as a result, there's a shorter and somewhat different way to achieve this in my library using ofSubLists:

List<String> input = Arrays.asList("A", "Apple", "B", "Banana", "C", "Carrot");
List<List<String>> list = StreamEx.ofSubLists(input, 2, 1).toList();
System.out.println(list); // [[A, Apple], [Apple, B], [B, Banana], [Banana, C], [C, Carrot]]

Here behind the scenes input.subList(i, i+2) is called for each input list position, so your data is not copied to the new lists, but subLists are created which refer to the original list.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • in `ofSubLists`, what happens to the `List>` if the original list is modified? – durron597 Aug 23 '15 at 03:23
  • 1
    @durron597, the result will be modified as well, of course. That's why it's "somewhat different". If you want to use `ofSubLists`, but want to copy the lists, you can always add the step like `.map(lst -> StreamEx.of(lst).toList())` or even `.map(ArrayList::new)`. This step is not added implicitly as if you don't need copying (which is the most of cases), you've got more efficient code. – Tagir Valeev Aug 23 '15 at 03:27
0

Here's a minimal amount of code that creates a List<List<String>> of the pairs:

List<List<String>> pairs = new LinkedList<>();
testing.reduce((a, b)-> {pairs.add(Arrays.asList(a,b)); return b;});
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • 1
    Note that while it de-facto works, it [violates](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#reduce-java.util.function.BinaryOperator-) the documented contract of `reduce` method (supplied function must be associative, non-interfering and stateless). – Tagir Valeev Aug 24 '15 at 06:40
  • @TagirValeev it's not actually a reduction (the result to discarded/ignored), - it's just a convenient way to get consecutive elements passed to a method - so there's nothing to "violate". It is however non-interfering (the only object mutated is external to both stream and lambda), so it won't do any damage. It is remarkably terse for what it achieves, which IMHO makes it valuable code. – Bohemian Aug 24 '15 at 07:59
  • You can say that this function is indeed *associative* regarding the unused result, but relying on the execution order for a side-effect clearly counter-acts the intention. The reason why the reduction function should be associative is that it should not depend on the order of execution. – Holger Aug 24 '15 at 08:40