21

If I have a Stream<T>, I can easily use skip(long) to skip the first few elements of a stream. However, there seems to be no equivalent for skipping a given number of elements at the end of the stream.

The most obvious solution is to use limit(originalLength - elementsToRemoveAtEnd), but that requires knowing the initial length beforehand, which isn't always the case.

Is there a way to remove the last few elements of a stream of unknown length without having to collect it into a Collection, count the elements and stream it again?

BambooleanLogic
  • 7,530
  • 3
  • 30
  • 56
  • 6
    The reason there is no such method is probably the one you are giving. There is normally no way to know when a `Stream` will end during streaming, so you can't really remove the last few elements until you are done. – Keppil Oct 16 '14 at 11:38
  • 1
    How do you determine the point when you want to skip the remaining elements? Couldnt you implement your own Stream to wrap around the original Stream? – user Oct 16 '14 at 11:50
  • How do you know an element is last if you don't know the length of the stream? – assylias Oct 16 '14 at 13:59

2 Answers2

11

There is no general storage-free solution for Streams that may have an unknown length. However, you don’t need to collect the entire stream, you only need a storage as large as the number of elements you want to skip:

static <T> Stream<T> skipLastElements(Stream<T> s, int count) {
    if(count<=0) {
      if(count==0) return s;
      throw new IllegalArgumentException(count+" < 0");
    }
    ArrayDeque<T> pending=new ArrayDeque<T>(count+1);
    Spliterator<T> src=s.spliterator();
    return StreamSupport.stream(new Spliterator<T>() {
        public boolean tryAdvance(Consumer<? super T> action) {
            while(pending.size()<=count && src.tryAdvance(pending::add));
            if(pending.size()>count) {
              action.accept(pending.remove());
              return true;
            }
          return false;
        }
        public Spliterator<T> trySplit() {
            return null;
        }
        public long estimateSize() {
            return src.estimateSize()-count;
        }
        public int characteristics() {
            return src.characteristics();
        }
    }, false);
}
public static void main(String[] args) {
    skipLastElements(Stream.of("foo", "bar", "baz", "hello", "world"), 2)
    .forEach(System.out::println);
}
Holger
  • 285,553
  • 42
  • 434
  • 765
  • 1
    The above has a serious hidden problem! Calling `Stream.spliterator()` can return a `StreamSpliterators.WrappingSpliterator`. The method `initPartialTraversalState()` stores all of the stream elements into a `Collection` (i.e. `SpinedBuffer`)! This could blow up the heap for streams based on file inputs. – Nathan Oct 31 '17 at 20:36
1

The following code uses ArrayDeque to buffer n elements where n is the number of elements to skip at the end. The trick is to use skip(n). This causes the first n elements to be added to the ArrayDeque. Then, once n elements have been buffered, the stream continues processing elements but pops elements from ArrayDeque. When the end of the stream is reached, the last n elements are stuck in the ArrayDeque and discarded.

ArrayDeque does not allow null elements. The code below maps null into NULL_VALUE before adding to ArrayDeque and then maps NULL_VALUE back to null after popping from ArrayDeque.

private static final Object NULL_VALUE = new Object();

public static <T> Stream<T> skipLast(Stream<T> input, int n)                   
{
   ArrayDeque<T> queue;

   if (n <= 0)
      return(input);

   queue = new ArrayDeque<>(n + 1);

   input = input.
      map(item -> item != null ? item : NULL_VALUE).
      peek(queue::add).
      skip(n).
      map(item -> queue.pop()).
      map(item -> item != NULL_VALUE ? item : null);

   return(input);
}
Nathan
  • 8,093
  • 8
  • 50
  • 76
  • 5
    Well, I prefer a solution that might be inefficient, depending on how Streams are implemented, over a solution that might break, depending on how Streams are implemented. Your solution relies on the current implementation’s inability to fuse a `skip` with the stream source. A future version might be able to truly skip all preceding steps that do not change the stream size, just like Java 9’s `count()` may not evaluate these functions… – Holger Nov 01 '17 at 11:03
  • I have heard those rumors but those optimizations will have to be limited in scope of application or a lot of code will end up broken. `peek()` should stop such optimizations because its lambda is almost always going to have side-effects. Thus, `skip()` and `count()` cannot optimize it away. – Nathan Nov 02 '17 at 16:44
  • 3
    Java 9’s `count()` *does* already break such code. As already discussed in [In Java streams is peek really only for debugging?](https://stackoverflow.com/q/33635717/2711488), you should not use a method contrary to its purpose. Since this method’s primary purpose is to debug, i.e. detect *whether* elements are processed at this stage, it would be nonsensical to force processing at this stage *because* of this method. You can not only use the method to detect the presence of such optimization, you may detect that streams sometimes process more elements than necessary, under Java 8 *and* Java 9. – Holger Nov 02 '17 at 17:15