50

It can be useful to iterate over multiple variables at once, overlapping (slice::windows), or not (slice::chunks).

This only works for slices; is it possible to do this for iterators, using tuples for convenience?

Something like the following could be written:

for (prev, next) in some_iter.windows(2) {
    ...
}

If not, could it be implemented as a trait on existing iterators?

Gaurang Tandon
  • 6,504
  • 11
  • 47
  • 84
ideasman42
  • 42,413
  • 44
  • 197
  • 320
  • 2
    You could easily do an `iter_pairs`, `iter_triples` once you decide what to do if there aren't enough items at the end, but not a generic "any size tuple" one with Rust at the moment. – Chris Emerson Feb 09 '17 at 11:35
  • 1
    If there aren't enough it would do nothing, as with slice functions. – ideasman42 Feb 09 '17 at 12:01
  • 1
    This was pointed out to me on IRC `https://docs.rs/itertools/*/itertools/trait.Itertools.html#method.tuple_windows` would like to look into its code before posting answer though. – ideasman42 Feb 09 '17 at 12:02

3 Answers3

48

It's possible to take chunks of an iterator using Itertools::tuples, up to a 4-tuple:

use itertools::Itertools; // 0.9.0

fn main() {
    let some_iter = vec![1, 2, 3, 4, 5, 6].into_iter();

    for (prev, next) in some_iter.tuples() {
        println!("{}--{}", prev, next);
    }
}

(playground)

1--2
3--4
5--6

If you don't know that your iterator exactly fits into the chunks, you can use Tuples::into_buffer to access any leftovers:

use itertools::Itertools; // 0.9.0

fn main() {
    let some_iter = vec![1, 2, 3, 4, 5].into_iter();

    let mut t = some_iter.tuples();
    for (prev, next) in t.by_ref() {
        println!("{}--{}", prev, next);
    }
    for leftover in t.into_buffer() {
        println!("{}", leftover);
    }
}

(playground)

1--2
3--4
5

It's also possible to take up to 4-tuple windows with Itertools::tuple_windows:

use itertools::Itertools; // 0.9.0

fn main() {
    let some_iter = vec![1, 2, 3, 4, 5, 6].into_iter();

    for (prev, next) in some_iter.tuple_windows() {
        println!("{}--{}", prev, next);
    }
}

(playground)

1--2
2--3
3--4
4--5
5--6

If you need to get partial chunks / windows, you can get

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • Would it work with a tuple of 3 elements? Looking at the doc it seems it could be possible. – Matthieu M. Feb 09 '17 at 15:23
  • 1
    @MatthieuM. yep, but the number of implementations *is* limited to a 4-tuple (which I've added). – Shepmaster Feb 09 '17 at 15:31
  • Yes, well... in the absence of variadics it's otherwise painful to implement I guess (and bulky too). – Matthieu M. Feb 09 '17 at 15:49
  • Was this removed from the latest version of itertools? I can't see it [here](https://docs.rs/itertools/0.7.6/itertools/). – dshepherd Feb 04 '18 at 18:14
  • 1
    @dshepherd I continue to see both methods. I've updated the links to the documentation and provided playground links. – Shepmaster Feb 04 '18 at 18:19
  • 1
    Ah, I was looking at the list of free functions rather than the list of functions on the itertools trait. – dshepherd Feb 04 '18 at 18:46
  • This does not handle the case where the input is not divisible by the chunk length, right? So it is **not** an equivalent of `chunk`. – Bakuriu Sep 20 '20 at 16:01
  • @Bakuriu you can get the remaining pieces via [`Tuples::into_buffer`](https://docs.rs/itertools/0.9.0/itertools/structs/struct.Tuples.html#method.into_buffer) – Shepmaster Sep 20 '20 at 19:28
31

TL;DR: The best way to have chunks and windows on an arbitrary iterator/collection is to first collect it into a Vec and iterate over that.


The exact syntax requested is impossible in Rust.

The issue is that in Rust, a function's signature is depending on types, not values, and while Dependent Typing exists, there are few languages that implement it (it's hard).

This is why chunks and windows return a sub-slice by the way; the number of elements in a &[T] is not part of the type and therefore can be decided at run-time.


Let's pretend you asked for: for slice in some_iter.windows(2) instead then.

Where would the storage backing this slice live?

It cannot live:

  • in the original collection because a LinkedList doesn't have a contiguous storage
  • in the iterator because of the definition of Iterator::Item, there is no lifetime available

So, unfortunately, slices can only be used when the backing storage is a slice.


If dynamic allocations are accepted, then it is possible to use Vec<Iterator::Item> as the Item of the chunking iterator.

struct Chunks<I: Iterator> {
    elements: Vec<<I as Iterator>::Item>,
    underlying: I,
}

impl<I: Iterator> Chunks<I> {
    fn new(iterator: I, size: usize) -> Chunks<I> {
        assert!(size > 0);

        let mut result = Chunks {
           underlying: iterator, elements: Vec::with_capacity(size)
        };
        result.refill(size);
        result
    }

    fn refill(&mut self, size: usize) {
        assert!(self.elements.is_empty());

        for _ in 0..size {
            match self.underlying.next() {
                Some(item) => self.elements.push(item),
                None => break,
            }
        }
    }
}

impl<I: Iterator> Iterator for Chunks<I> {
    type Item = Vec<<I as Iterator>::Item>;

    fn next(&mut self) -> Option<Self::Item> {
        if self.elements.is_empty() {
            return None;
        }

        let new_elements = Vec::with_capacity(self.elements.len());
        let result = std::mem::replace(&mut self.elements, new_elements);

        self.refill(result.len());

        Some(result)
    }
}

fn main() {
    let v = vec!(1, 2, 3, 4, 5);

    for slice in Chunks::new(v.iter(), 2) {
        println!("{:?}", slice);
    }
}

Will return:

[1, 2]
[3, 4]
[5]

The canny reader will realize that I surreptitiously switched from windows to chunks.

windows is more difficult, because it returns the same element multiple times which require that the element be Clone. Also, since it needs returning a full Vec each time, it will need internally to keep a Vec<Vec<Iterator::Item>>.

This is left as an exercise to the reader.


Finally, a note on performance: all those allocations are gonna hurt (especially in the windows case).

The best allocation strategy is generally to allocate a single chunk of memory and then live off that (unless the amount is really massive, in which case streaming is required).

It's called collect::<Vec<_>>() in Rust.

And since the Vec has a chunks and windows methods (by virtue of implementing Deref<Target=[T]>), you can then use that instead:

for slice in v.iter().collect::<Vec<_>>().chunks(2) {
    println!("{:?}", slice);
}

for slice in v.iter().collect::<Vec<_>>().windows(2) {
    println!("{:?}", slice);
}

Sometimes the best solutions are the simplest.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 2
    Sorry to downvote, but *The exact syntax requested is impossible in Rust* is not true; please check [my answer](http://stackoverflow.com/a/42139758/155423). Most of the rest of your analysis makes sense though. – Shepmaster Feb 09 '17 at 14:50
  • 2
    @Shepmaster: Your answer does not have the exact syntax requested either. The request is `for (prev, next) in some_iter.windows(2)`, with 2 a runtime parameter, which I interpret as meaning that I could pass 3 and have `for (n0, n1, n2) in some_iter.windows(3)` and that's impossible. You chose to focus on `(prev, next)` and ignore the runtime parameter, it may be OK with the OP, but as far as I am concerned this is not what they asked for (and I don't read minds). – Matthieu M. Feb 09 '17 at 15:10
  • A good point. specifying *both* the tuple size and an argument to `windows` would not make sense, especially if there was a mismatch. I might encourage you to draw explicit attention to that in your answer - perhaps adding an example? – Shepmaster Feb 09 '17 at 15:17
  • @Shepmaster: I am not quite sure what kind of example you mean; I already cited that types cannot depend on values unless one uses Dependent Typing and I am at a loss at how to illustrate it to be honest. Maybe it's not that important since your answer is clearly better. – Matthieu M. Feb 09 '17 at 15:26
  • Your answer makes a lot of sense, but I cannot help but feel that the lack of `chunks` and `windows` on iterators is an issue. As you had already so, it's actually really easy to implement `chunks` with a dynamic allocation of only c elements where c is the size of the chunk; in contrast using `collect::>` and then chunking means we need a dynamic allocation of n elements where n is the total number of elements produced by the iterators. Your `chunks` example can be optimized to do a single allocation, so performance won't hurt. – kccqzy Nov 18 '18 at 10:14
  • 1
    > `This is left as an exercise to the reader.` the `windows` functionality described here is exactly what I am looking for but I am not sure how to implement it, still new to Rust. Is there an example? – user5359531 Aug 13 '19 at 19:17
20

On nightly

The chunks version is now available on nightly under the name array_chunks

#![feature(iter_array_chunks)]

for [a, b, c] in some_iter.array_chunks() {
    ...
}

And it handles remainders nicely:

#![feature(iter_array_chunks)]

for [a, b, c] in some_iter.by_ref().array_chunks() {
    ...
}

let rem = some_iter.into_remainder();

On stable

Since Rust 1.51 this is possible with const generics where the iterator yields constant size arrays [T; N] for any N.

I built two standalone crates which implement this:

use iterchunks::IterChunks; // 0.2

for [a, b, c] in some_iter.array_chunks() {
    ...
}
use iterwindows::IterWindows; // 0.2

for [prev, next] in some_iter.array_windows() {
    ...
}

Using the example given in the Itertools answer:

use iterchunks::IterChunks; // 0.2

fn main() {
    let some_iter = vec![1, 2, 3, 4, 5, 6].into_iter();

    for [prev, next] in some_iter.array_chunks() {
        println!("{}--{}", prev, next);
    }
}

This outputs

1--2
3--4
5--6

Most times the array size can be inferred but you can also specific it explicitly. Additionally, any reasonable size N can be used, there is no limit like in the Itertools case.

use iterwindows::IterWindows; // 0.2

fn main() {
    let mut iter = vec![1, 2, 3, 4, 5, 6].into_iter().array_windows::<5>();
    println!("{:?}", iter.next());
    println!("{:?}", iter.next());
    println!("{:?}", iter.next());
}

This outputs

Some([1, 2, 3, 4, 5])
Some([2, 3, 4, 5, 6])
None

Note: array_windows() uses clone to yield elements multiple times so its best used for references and cheap to copy types.

Ross MacArthur
  • 4,735
  • 1
  • 24
  • 36