1

I have a piece of homework to get the longest substring of consecutive equal characters with the signature fn(s: &str) -> Option<&str>. However, my attempt yields a compiler error:

pub fn longest_sequence(s: &str) -> Option<&str> {
    let s_len = s.len();
    if s_len == 0 {
        return None
    }
    let mut cur_char = s.chars().nth(0);
    let mut cur_occ = 0;
    let mut max_char = s.chars().nth(0);
    let mut max_occ = 0;

    for i in 0..s_len {
        let c = s.chars().nth(i);
        if c == cur_char {
            cur_occ = cur_occ + 1;
        } else {
            if cur_occ > max_occ {
                max_occ = cur_occ;
                max_char = cur_char;
            }
            cur_char = c.clone();
            cur_occ = 1;
        }
    }
    if cur_occ > max_occ {
        max_occ = cur_occ;
        max_char = cur_char;
    }
    let cr = max_char.unwrap();
    let charstr = cr.to_string();
    let string = charstr.repeat(max_occ);
    let strr = string.as_str();
    let some = Some(strr.clone());
    println!("in {:?}",some);
    some // Compilation error
//  None  // if returning None, then it compiles and runs as expected
}

fn main () {
    println!("out {:?}",longest_sequence(&"aaabbbcccc"));
}
error[E0597]: `string` does not live long enough
  --> r.rs:31:16
   |
31 |     let strr = string.as_str();
   |                ^^^^^^ borrowed value does not live long enough
...
35 | }
   | - borrowed value only lives until here
   |
note: borrowed value must be valid for the anonymous lifetime #1 defined on the function body at 1:1...
  --> r.rs:1:1
   |
1  | / pub fn longest_sequence(s: &str) -> Option<&str> {
2  | |     let s_len = s.len();
3  | |     if s_len == 0 {
4  | |         return None
...  |
34 | |     some
35 | | }
   | |_^

If I return None instead of some, the code compiles and the output is as you would expect:

in Some("cccc")
out None

How can I make this work? Can someone help me understand the issue?

kmdreko
  • 42,554
  • 6
  • 57
  • 106
Ruixing Wang
  • 75
  • 1
  • 7

2 Answers2

3

I think this would make a lot more sense if you kept track of indexes into the passed in string and then returned a slice of that string with:

s.get(start_index..stop_index)

Rust makes this a little tricky because you need to index the bytes of utf-8 characters properly, but you don't just want the longest string of bytes, you want the longest run of characters.

Still, I think it's easier to understand and cleaner to do something like this: Keep track of the length of the current longest sequence and start and stop bytes. Keep a character count of the current run of characters, then use get() to return the slice. This avoids the problem of making a new string in the function.

pub fn longest_sequence(s: &str) -> Option<&str> {
    if s.is_empty() {
        return None
    }

    let mut chars = s.char_indices();
    let (mut char_start, mut curr) = chars.next().unwrap(); // get initial character
    let mut longest = (char_start,curr.len_utf8(), 1);      // (start(byte), stop(byte), len(characters)
    let mut curr_char_count = 1;                            // number of chars in current run

    for (idx, c) in chars {
        match c {
            c if c == curr => {
                curr_char_count += 1;
                if curr_char_count > longest.2 {
                    longest = (char_start, idx + c.len_utf8(), curr_char_count);
                }
            }
            n => {
                curr = n;
                curr_char_count = 1;
                char_start = idx;
            }
        }
    }
    s.get(longest.0..longest.1)
}

This will allow non-ascii strings to work and still return a slice of the original string:

fn main() {     
    let s= longest_sequence("BBmmm!");
    println!("{:?}", s);

    // Some("")
}
Mark
  • 90,562
  • 7
  • 108
  • 148
2

The fundamental problem is you're creating a heap allocated string in the middle of the function, and you're returning a reference to the string. When the function exits, any values that were owned by the function (such as the string) will be dropped (and the reference will be invalidated). Rust just saved you from a memory error!

For example, this line here:

let string = charstr.repeat(max_occ);

repeats a character and creates a string but it needs somewhere to put the string. Hence it creates a string on the heap. However, the string must be deallocated at some point (otherwise you'll have a memory leak!) so Rust has the concept of ownership. Any values that are owned by the current function must also be dropped or deallocated by the function.

Here you are taking a borrow of the string:

let strr = string.as_str();

Borrows do not transfer ownership. Hence, the string is still owned by the current function. This explains the error message: error[E0597]: `string` does not live long enough. You can't return the borrow because the owned value gets dropped at the end of the function (or more specifically, the lifetime of the borrow is only for the current function).

There are two ways to fix this:

  1. Return the string directly:

pub fn longest_sequence(s: &str) -> Option<String>

If you return String instead of &str, you're transferring the ownership of the string to the caller.

  1. Return a borrow of the string in the function parameter

You can modify the algorithm to take the indexes of the substring you want, and then return a slice of s. This is fine because the return value won't outlive any owned values. Specifically, it will have the same lifetime as s.

Finally, if you have a variable of type String anywhere in the function, you'll generally have issues with lifetimes (with string processing anyway) when returning &str.

Technocoder
  • 363
  • 4
  • 9