4

I'm trying to write a function that receives a vector of vectors of strings and returns all vectors concatenated together, i.e. it returns a vector of strings.

The best I could do so far has been the following:

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
    let vals : Vec<&String> = vecs.iter().flat_map(|x| x.into_iter()).collect();
    vals.into_iter().map(|v: &String| v.to_owned()).collect()
}

However, I'm not happy with this result, because it seems I should be able to get Vec<String> from the first collect call, but somehow I am not able to figure out how to do it.

I am even more interested to figure out why exactly the return type of collect is Vec<&String>. I tried to deduce this from the API documentation and the source code, but despite my best efforts, I couldn't even understand the signatures of functions.

So let me try and trace the types of each expression:

- vecs.iter(): Iter<T=Vec<String>, Item=Vec<String>>
- vecs.iter().flat_map(): FlatMap<I=Iter<Vec<String>>, U=???, F=FnMut(Vec<String>) -> U, Item=U>
- vecs.iter().flat_map().collect(): (B=??? : FromIterator<U>)
- vals was declared as Vec<&String>, therefore 
      vals == vecs.iter().flat_map().collect(): (B=Vec<&String> : FromIterator<U>). Therefore U=&String.

I'm assuming above that the type inferencer is able to figure out that U=&String based on the type of vals. But if I give the expression the explicit types in the code, this compiles without error:

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
    let a: Iter<Vec<String>> = vecs.iter();
    let b: FlatMap<Iter<Vec<String>>, Iter<String>, _> = a.flat_map(|x| x.into_iter());
    let c = b.collect();
    print_type_of(&c);
    let vals : Vec<&String> = c;
    vals.into_iter().map(|v: &String| v.to_owned()).collect()
}

Clearly, U=Iter<String>... Please help me clear up this mess.

EDIT: thanks to bluss' hint, I was able to achieve one collect as follows:

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
    vecs.into_iter().flat_map(|x| x.into_iter()).collect()
}

My understanding is that by using into_iter I transfer ownership of vecs to IntoIter and further down the call chain, which allows me to avoid copying the data inside the lambda call and therefore - magically - the type system gives me Vec<String> where it used to always give me Vec<&String> before. While it is certainly very cool to see how the high-level concept is reflected in the workings of the library, I wish I had any idea how this is achieved.

EDIT 2: After a laborious process of guesswork, looking at API docs and using this method to decipher the types, I got them fully annotated (disregarding the lifetimes):

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
    let a: Iter<Vec<String>> = vecs.iter();
    let f : &Fn(&Vec<String>) -> Iter<String> = &|x: &Vec<String>| x.into_iter();
    let b: FlatMap<Iter<Vec<String>>, Iter<String>, &Fn(&Vec<String>) -> Iter<String>> = a.flat_map(f);
    let vals : Vec<&String> = b.collect();
    vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
Community
  • 1
  • 1
kirillkh
  • 181
  • 3
  • 12
  • `- vecs.iter(): Iter, Item=Vec>`: this is incorrect. There is no associated type named `Item` on struct `Iter` (only traits may have associated types). `vecs.iter()` is of type `Iter>`, but [this type implements `Iterator>`](http://doc.rust-lang.org/stable/std/slice/struct.Iter.html#implementations) (note the `&`). When you `flat_map` this, you turn `&Vec` into `&String`. – Francis Gagné Jul 01 '15 at 02:19

1 Answers1

6

I'd think about: why do you use iter() on the outer vec but into_iter() on the inner vecs? Using into_iter() is actually crucial, so that we don't have to copy first the inner vectors, then the strings inside, we just receive ownership of them.

We can actually write this just like a summation: concatenate the vectors two by two. Since we always reuse the allocation & contents of the same accumulation vector, this operation is linear time.

To minimize time spent growing and reallocating the vector, calculate the space needed up front.

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
    let size = vecs.iter().fold(0, |a, b| a + b.len());
    vecs.into_iter().fold(Vec::with_capacity(size), |mut acc, v| {
        acc.extend(v); acc
    })
}

If you do want to clone all the contents, there's already a method for that, and you'd just use vecs.concat() /* -> Vec<String> */


The approach with .flat_map is fine, but if you don't want to clone the strings again you have to use .into_iter() on all levels: (x is Vec<String>).

vecs.into_iter().flat_map(|x| x.into_iter()).collect()

If instead you want to clone each string you can use this: (Changed .into_iter() to .iter() since x here is a &Vec<String> and both methods actually result in the same thing!)

vecs.iter().flat_map(|x| x.iter().map(Clone::clone)).collect()

bluss
  • 12,472
  • 1
  • 49
  • 48
  • This gives me some errors: ` build.rs:104:9: 104:12 error: cannot borrow immutable local variable `acc` as mutable build.rs:104 acc.extend(v); v ^~~ note: in expansion of closure expansion build.rs:103:53: 105:6 note: expansion site build.rs:104:24: 104:25 error: use of moved value: `v` [E0382] build.rs:104 acc.extend(v); v` – kirillkh Jun 29 '15 at 21:55
  • maybe because I didn't test compile it until after I submitted, but then I fixed it – bluss Jun 29 '15 at 21:55
  • How can I independently learn about vecs.concat()? It's not on the API page for std::vec::Vec. I mean how am I supposed to find it if looking at the API doesn't give any hints? Even now that I empirically know it's there I can't figure out where it is defined. – kirillkh Jun 29 '15 at 22:02
  • As to your solution with fold(), thank you. Still, what I'm after is learning how to achieve the needed result independently, so I would really like to use my code as the starting point and see how it can be improved to only use one collect() rather than rewritten. – kirillkh Jun 29 '15 at 22:09
  • Well, this answers my main question, and I would really like to mark your answer as solution, but OTOH I also would like to get an answer to the meta-questions (how to figure out types by looking at the API). Perhaps my question is too broad? – kirillkh Jun 29 '15 at 22:40
  • The remark on `x.into_iter()` not being the method you think might be important – bluss Jun 29 '15 at 23:13
  • Instead of `flat_map(|x| x.into_iter())`, just do `flat_map(|x| x)`. – Veedrac Jun 29 '15 at 23:23
  • @kirillkh `concat` utilizes automatic dereferencing, so is actually a function on `*Vec`, which is the slice `[T]`. https://doc.rust-lang.org/std/primitive.slice.html#method.concat – Veedrac Jun 29 '15 at 23:29
  • Thanks Veedrac. How would you go about finding `concat` if you didn't know it was there? I even tried to Google it with no luck. – kirillkh Jun 30 '15 at 05:28