1

Suppose I have a char in the variable c and a positive int in the variable n. I want to build the str containing c occurring n times. How can I do it?

I tried building it as a String, and maybe I just got dizzy trying to read the documentation on strings, but I couldn't see how to convert it to a str. But then if I'm trying to just build it as a str directly then I couldn't see how to do that either.

For context, here is the full function I'm trying to implement. It takes a string and finds the longest sequence of consecutive characters (and breaks ties by taking the first that occurs).

pub fn longest_sequence(s: &str) -> Option<&str> {
    if s.len() == 0 { return None; }
    let mut current_c = s.as_bytes()[0] as char;
    let mut greatest_c = s.as_bytes()[0] as char;
    let mut current_num = 0;
    let mut greatest_num = 0;
    for ch in s.chars() {
        if current_c == ch {
            current_num += 1;
            if current_num > greatest_num {
                greatest_num = current_num;
                greatest_c = current_c;
            }
        } else {
            current_num = 1;
            current_c = ch;
        }
    }
    // Now build the output str ...
}
Addem
  • 3,635
  • 3
  • 35
  • 58
  • For your particular problem, you don't need to build a new string, note that you could return a substring of your original string `s`. Once you have your start/end indexes of the longest sequence, you just return `s[start..end]` – effect Dec 02 '22 at 00:59
  • Note that `String`/`str` in Rust are designed to handle UTF-8, which makes how they work quite different than languages like C/C++. See this question for more details: https://stackoverflow.com/questions/24158114/what-are-the-differences-between-rusts-string-and-str – effect Dec 02 '22 at 01:03
  • @effect Right, so I tried something like that. I didn't return a slice of the original string (because that seems hard, and it seems easier to just identify the character I want and its frequency, rather than tracking the indices at which they occur) but I tried making a new string and then returning a string slice of that. But then I ran into ownership issues with that, that I don't fully understand. – Addem Dec 02 '22 at 01:04
  • 3
    Ok, if you want to do it that way, then you probably want to return a `Option` instead of an `Option<&str>`. You can't return a `&str` reference to a `String` you create within the function, since the `String` goes out of scope at the end of the function, making your reference invalid. – effect Dec 02 '22 at 01:14
  • @effect Ah ... wow, this ownership stuff ... ok, I think you're right then, maybe I should just do whatever it takes to slice the input string instead. Ok, thank you! – Addem Dec 02 '22 at 02:16
  • @Addem Creating a string in other languages and returning a reference to it is undefined behaviour as well .. in general, creating a temporary variable and returning a reference to it is undefined behaviour. Rust just tells you that it is a problem ;) – Finomnis Dec 02 '22 at 06:19
  • *"but I couldn't see how to convert it to a `str`"* - to me, the more important question is why are you trying to do this? Why do you require a `str` as a return value? I agree with @Addem that returning a `String` would probably be the right way to go. – Finomnis Dec 02 '22 at 06:22

2 Answers2

3

I think there are a couple of misconceptions about str vs String.

  • str can never exist alone. It is always used as &str (or Box<str> or *str, but in your case those shouldn't matter).
  • &str does not own any data. It is merely a reference to (parts of) another String.
  • String actually holds data.
  • So when you want to return data, use String; if you want to reference existing data, return &str.
  • There is no way to convert a local String to a &str. Somewhere the data has to be stored, and &str doesn't store it. (for completeness sake: Yes you could leak it, but that would create a permanent string in memory that will never go away again)

So in your case there are two ways:

  • Reference the input &str, because somewhere its data is already stored.
  • Return a String instead.

As a side note: do not do s.as_bytes()[0] as char, as it will not work with UTF8-strings. Rust strings are defined as UTF8.

Here is one possible solution:

pub fn longest_sequence(s: &str) -> Option<&str> {
    let mut current_c = s.chars().next()?;
    let mut current_start = 0;
    let mut current_len = 0;
    let mut greatest: &str = "";
    let mut greatest_len = 0;
    for (pos, ch) in s.char_indices() {
        if current_c == ch {
            current_len += 1;
        } else {
            if greatest_len < current_len {
                greatest = &s[current_start..pos];
                greatest_len = current_len;
            }

            current_len = 1;
            current_c = ch;
            current_start = pos;
        }
    }

    if greatest_len < current_len {
        greatest = &s[current_start..];
    }

    Some(greatest)
}

pub fn main() {
    let s = "€€";

    let seq = longest_sequence(s);
    println!("{:?}", seq);
}
Some("")

Some explanations:

  • No need to check for empty string. s.chars().next()? does so automatically.
  • Use s.chars().next() instead of s.as_bytes()[0] as char, as the second one is not UTF8 compatible.
  • I explicitely store greatest_len instead of using greatest.len() because greatest.len() is also not UTF8 compatible as it gives you the size of the string in bytes, not in chars.
  • You stored the new largest string whenever a new char of the same value was found; I had to move it to the case where the char type changed (and once after the loop), because we don't yet know the end of the current char. Again, note that &s[current_start..current_start+current_len] wouldn't work, because &s[ .. ] wants indices in bytes, but current_len is in chars. So we need to wait for another char to know where the previous one ended.

Another solution, based on your code, would be:

pub fn longest_sequence(s: &str) -> Option<String> {
    let mut current_c = s.chars().next()?;
    let mut greatest_c = current_c;
    let mut current_num = 0;
    let mut greatest_num = 0;
    for ch in s.chars() {
        if current_c == ch {
            current_num += 1;
            if current_num > greatest_num {
                greatest_num = current_num;
                greatest_c = current_c;
            }
        } else {
            current_num = 1;
            current_c = ch;
        }
    }

    // Build the output String
    Some(std::iter::repeat(greatest_c).take(greatest_num).collect())
}

pub fn main() {
    let s = "€€";

    let seq = longest_sequence(s);
    println!("{:?}", seq);
}
Some("")
Finomnis
  • 18,094
  • 1
  • 20
  • 27
  • 1
    Interesting -- you're definitely right about some of my misconceptions. I'll need to study your post for a minute, but this is definitely a big help, thank you! – Addem Dec 02 '22 at 16:29
  • @Addem *very* recommended watch: https://www.youtube.com/watch?v=rDoqT-a6UFg I recommend the entire video, and in one section section it talks about what `String` vs `str` is behind the scenes, layed out in memory. For some people (like me) having it visual like that really makes it click, maybe that works for you as well. – Finomnis Dec 02 '22 at 19:50
-6

To convert a String to &'static str you need to leak it like this:

fn leak(s: String) -> &'static str {
    let ptr = s.as_str() as *const str;
    core::mem::forget(s);
    unsafe {&*ptr}
}

And char to String:

fn cts(c: char, n: usize) -> String {
    (0..n)
        .map(|_| c)
        .collect()
}

So char to &'static str basically will look like this:

fn conv(c: char, n: usize) -> &'static str {
    leak(cts(c, n))
}

I do not recommend to leak the String tho, just use it as is.

Miiao
  • 751
  • 1
  • 8
  • @Finomnis, calling `forget` basically does it. Using `Box::leak` is a wrong way. And no, I don’t create a pointer to the stack. I’ve also said that memory leak is unnecessary here. – Miiao Dec 02 '22 at 06:43
  • @Miiao I stand corrected. I forgot that `s` already puts its data on the heap. My bad. – Finomnis Dec 02 '22 at 06:45
  • I still stand to the fact that this leaks memory every time this function is called, so I recommend against it. – Finomnis Dec 02 '22 at 06:46
  • @Finomnis, it’s ok. Sorry if my reply was rude. And yes, I agree that memory leak is a bad approach. – Miiao Dec 02 '22 at 06:47
  • @Miiao: Why would `Box::leak` be wrong? Btw, you want `as *const str`, not `u8` to compile. – Caesar Dec 02 '22 at 06:47
  • @Caesar It's not wrong per se, but it leaks the `String` itself, which internally references a `str`, so while it works, you then leaked two things while you would only have required to leak one. – Finomnis Dec 02 '22 at 06:48
  • 2
    Either way, if you really **want** to leak a `String`, imo the only 'correct' way to do it is through [`String::into_boxed_str`](https://doc.rust-lang.org/std/string/struct.String.html#method.into_boxed_str): `let s_static: &'static str = Box::leak(s.into_boxed_str());` – Finomnis Dec 02 '22 at 06:50
  • @Caesar, my bad. Well, `Box::leak` does almost exactly what `forget` does, but returns a `&'static mut` while we need `&'static`. It can be easily fixed using deref, and probably without overhead. – Miiao Dec 02 '22 at 06:51
  • @Miiao `&'static mut` can be directly stored in `&'static` without any overhead. – Finomnis Dec 02 '22 at 06:52
  • I don’t want to focus on leaking `String`s anyway, I just wanted to show how to repeat something using iterators. – Miiao Dec 02 '22 at 06:55
  • Ok after looking at godbolt, I agree that your 'leak' is the least-overhead way to leak a string. I apologize ;) Although your cast should say `as *const str` instead of `as *const u8`. Otherwise you get a compilation error. – Finomnis Dec 02 '22 at 06:59
  • Yeah, I haven’t done this for a long time. It’s a poor excuse probably, but I didn’t sleep well Thanks for the conversation – Miiao Dec 02 '22 at 07:04