18

I'm having trouble writing a function that takes a collection of strings as parameter. My function looks like this:

type StrList<'a> = Vec<&'a str>;

fn my_func(list: &StrList) {
    for s in list {
        println!("{}", s);
    }
}

All goes well if I pass a Vec<&'a str> to the function, as expected. However, if I pass a Vec<String> the compiler complains:

error[E0308]: mismatched types
  --> src/main.rs:13:13
   |
13 |     my_func(&v2);
   |             ^^^ expected &str, found struct `std::string::String`
   |
   = note: expected type `&std::vec::Vec<&str>`
   = note:    found type `&std::vec::Vec<std::string::String>`

This is the main used:

fn main() {
    let v1 = vec!["a", "b"];
    let v2 = vec!["a".to_owned(), "b".to_owned()];
    my_func(&v1);
    my_func(&v2);
}

My function is not able to take vectors of owned strings. Conversely, if I change the StrList type into:

type StrList = Vec<String>;

The first call fails, and the second works.

A possible solution is to produce a Vec<&'a str> from v2 in this way:

let v2_1 : Vec<_> = v2.iter().map(|s| s.as_ref()).collect();

But it seems very odd to me. my_func should not care about the ownership of the strings.

What kind of signature should I use for my_func to support both vectors of owned strings and string references?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
mbrt
  • 1,958
  • 1
  • 17
  • 33

2 Answers2

28

Although String and &str are very closely related, they are not identical. Here's what your vectors look like in memory:

v1 ---> [ { 0x7890, // pointer to "a" + 7 unused bytes
            1 }     // length of "a"
          { 0x7898, // pointer to "b" + 7 unused bytes
            1 } ]   // length

v2 ---> [ { 0x1230 // pointer to "a" + 7 unused bytes (a different copy)
            8      // capacity
            1 }    // length
          { 0x1238 // pointer ...
            8      // capacity
            1 } ]  // length

Here each line is the same amount of memory (four or eight bytes depending on pointer size). You can't take the memory of one of these and treat it like the other. The memory layout doesn't match up. The items are of different sized and have different layout. For example, if v1 stores its items starting at address X and v2 stores its items starting at address Y, then v1[1] is at address X + 8 but v2[1] is at address Y + 12.

What you can do is write a generic function like this:

fn my_func<T: AsRef<str>>(list: &[T]) {
    for s in list {
        println!("{}", s.as_ref());
    }
}

Then the compiler can generate appropriate code for both &[String] and &[&str] as well as other types if they implement AsRef<str>.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • It actually works. Can you explain better why `&[T]` works? The `AsRef` is clear. – mbrt Sep 23 '15 at 06:37
  • 4
    @brt You know slices `&[T]` and only wonder why the function accepts `&[T]` but `main` passes in `&Vec`, right? The answer is [deref coercions](http://doc.rust-lang.org/book/deref-coercions.html). Because `&Vec` is a much less general type than `&[T]` (you can get the latter from many sources that aren't vectors) it is preferred to write functions to accept `&[T]` rather than `&Vec`. For ergonomics, `foo(&vec)` automatically constructs the slice from the vector. –  Sep 23 '15 at 08:45
5

To build on delnan's great answer, I want to point out one more level of generics that you can add here. You said:

a collection of strings

But there are more types of collections than slices and vectors! In your example, you care about forward-only, one-at-a-time access to the items. This is a perfect example of an Iterator. Below, I've changed your function to accept any type that can be transformed into an iterator. You can then pass many more types of things. I've used a HashSet as an example, but note that you can also pass in v1 and v2 instead of &v1 or &v2, consuming them.

use std::collections::HashSet;

fn my_func<I>(list: I)
    where I: IntoIterator,
          I::Item: AsRef<str>,
{
    for s in list {
        println!("{}", s.as_ref());
    }
}

fn main() {
    let v1 = vec!["a", "b"];
    let v2 = vec!["a".to_owned(), "b".to_owned()];
    let v3 = {
        let mut set = HashSet::new();
        set.insert("a");
        set.insert("b");
        set.insert("a");
        set
    };
    let v4 = {
        let mut set = HashSet::new();
        set.insert("a".to_owned());
        set.insert("b".to_owned());
        set.insert("a".to_owned());
        set
    };

    my_func(&v1);
    my_func(v1);
    my_func(&v2);
    my_func(v2);
    my_func(&v3);
    my_func(v3);
    my_func(&v4);
    my_func(v4);
}
Community
  • 1
  • 1
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • This is actually even better! @delnan actually answers my question, however this version is more generic. – mbrt Sep 24 '15 at 13:48