4

Within a function which takes in &str and returns impl Iterator<Item = char>, I am trying to convert the input to lowercase and then filter and map the characters of that lowered form. I have been stuck for some time with the following error while using str.to_lowercase():

  --> src/lib.rs                                                                                                                      
   |                                                                                                                                        
   |        cipher                                                                                                                          
   |   _____^                                                                                                                               
   |  |_____|                                                                                                                               
   | ||                                                                                                                                     
   | ||         .to_lowercase()                                                                                                             
   | ||_______________________- temporary value created here                                                                                
   | |          .chars()                                                                                                                    
   | |          .filter(|c| c.is_alphanumeric() && c.is_ascii())                                                                            
...  |                                                                                                                                      
   | |              }                                                                                                                       
   | |          })                                                                                                                          
   | |___________^ returns a value referencing data owned by the current function    

The function in its original form:

pub fn decode_to_iter(cipher: &str) -> impl Iterator<Item = char> {
    cipher
        .to_lowercase()
        .chars()
        .filter(|c| c.is_alphanumeric() && c.is_ascii())
        .map(|c| {
            if c.is_alphabetic() {
                (((b'z' - (c as u8)) + b'a') as char)
            } else {
                c
            }
        })
}

I came across a couple of questions online asking very similar questions about how to return an owned value that's been transformed with .to_lowercase() but none of the solutions posted work for me.

I am trying to avoid using &char and stick with char in my return type.

I've tried to use functions like .to_owned() to take ownership of the reference but have come up empty-handed.

Ultimately, I was able to get my function to compile and pass my tests using char.to_ascii_lowercase(). The working version of my function is:

pub fn decode_to_iter<'a>(cipher: &'a str) -> impl Iterator<Item = char> + 'a {
    cipher
        .chars()
        .filter(|c| c.is_alphanumeric() && c.is_ascii())
        .map(|c| {
            if c.is_alphabetic() {
                (((b'z' - (c.to_ascii_lowercase() as u8)) + b'a') as char)
            } else {
                c.to_ascii_lowercase()
            }
        })
}

One of the things that are confusing me the most is what the difference between the str.to_lowercase() and char.to_ascii_lowercase() is. The documentation for .to_ascii_lowercase() under Primative Type Char shows:

pub fn to_ascii_lowercase(&self) -> char

while the documentation for .to_lowercase() under Primative Type Str shows:

pub fn to_lowercase(&self) -> String

Unless I'm misunderstanding, both functions seem to return an owned value so I am unsure why only char.to_ascii_lowercase() works.

I am wondering:

  1. how to properly return an Impl Iterator value which uses .to_lowercase() rather than .to_ascsii_lowercase()?

  2. what the difference is between char.to_lowercase() and str.to_ascii_lowercase()?

AC-5
  • 193
  • 1
  • 1
  • 10

1 Answers1

4

The issue here is that str::to_lowercase allocates a new String value as the lowercased version of your string, and then the str::chars method borrows from that new String value. (You can tell it borrows from the String value by looking at the std::str::Chars struct, which has a lifetime parameter referring to the string whose characters it is iterating over.)

So why is this problematic? Well, that String value allocated by to_lowercase is a temporary value created as part of your iterator chain, which is in turn dropped at the end of your function's scope (the compiler's error message should tell you this). Therefore, the compiler is preventing you from a use-after-free bug. If it let you return the iterator, then it would allow callers to read from a String that had been deallocated, which violates memory safety.

Your variant that uses char::to_ascii_lowercase works because you never allocate that intermediate String value. Therefore, you wind up returning an iterator that borrows from the input to the function, which is valid, and is why you needed to add a lifetime parameter. (Otherwise, the compiler assumes the lifetime on a impl Trait is 'static, which is not the case here. The lifetime of your returned value is tied to the lifetime of the input to the function.)

You can fix this by avoiding the allocation of a temporary String, which should hopefully be more efficient. The trick is to realize that char has a method char::to_lowercase which returns an iterator over the lowercase equivalent of the given character, and not a String. Therefore, you can just read from this directly:

pub fn decode_to_iter<'a>(cipher: &'a str) -> impl Iterator<Item = char> + 'a {
    cipher
        .chars()
        .flat_map(|c| c.to_lowercase())
        .filter(|c| c.is_alphanumeric() && c.is_ascii())
        .map(|c| {
            if c.is_alphabetic() {
                (((b'z' - (c as u8)) + b'a') as char)
            } else {
                c
            }
        })
}

The only real trick here is to use flat_map, which is like a normal map, but it lets you return an iterator that is then flattened into the original iterator (if you used a normal map here, you'd wind up with an iterator of iterators).

With that said, if you really only care about ASCII codepoints here (due to your filter predicate), then you don't need the full Unicode aware lowercasing mechanism. So I'd probably write it similarly to your second variant, with char::to_ascii_lowercase:

pub fn decode_to_iter<'a>(cipher: &'a str) -> impl Iterator<Item = char> + 'a {
    cipher
        .chars()
        .filter(|c| c.is_ascii_alphanumeric())
        .map(|c| c.to_ascii_lowercase())
        .map(|c| {
            if c.is_alphabetic() {
                (((b'z' - (c as u8)) + b'a') as char)
            } else {
                c
            }
        })
}

And here's a playground link showing the code.

BurntSushi5
  • 13,917
  • 7
  • 52
  • 45
  • Thanks @BurntSushi5 for the explanation and for pointing out the use of `flat_map`. You make a good point about avoiding an unnecessary String allocation by using `char` lowercase methods but I'm still wondering if it is possible to take ownership of the temporary string from `str::to_lowercase` within the iterator chain to avoid having it dropped at the end of the function scope? – AC-5 Jul 24 '19 at 12:35
  • 1
    @AC-5 That would be answered by [How can I store a Chars iterator in the same struct as the String it is iterating on?](https://stackoverflow.com/q/43952104/3650362) – trent Jul 24 '19 at 13:08