1

I'm trying to reverse a string. I use the solution of this post and it works. But I would like to try with bytes instead of grapheme clusters as shown below:

fn reverse2(input: &str) -> String {
    input.as_bytes().iter().rev().collect()
}

Unfortunately, I can't run the function collect() after rev(). I don't know which method to use. How would you do it ?

kmdreko
  • 42,554
  • 6
  • 57
  • 106
Caladay
  • 111
  • 6
  • 1
    `String`s in Rust are always valid UTF-8 strings, and reversing the bytes doesn't always produce valid UTF-8 strings. What you want is to reverse the `char`s (not bytes). – kotatsuyaki Dec 04 '22 at 09:43

3 Answers3

2

As you explicitly ask about not using chars(), you have to restrict yourself to ASCII strings.

pub fn reverse2(input: &str) -> String {
    // Reversing on byte-level only works with ASCII strings.
    assert!(input.is_ascii());

    let reversed_bytes: Vec<u8> = input.as_bytes().iter().copied().rev().collect();
    let reversed_string = unsafe {
        // SAFETY: This is guaranteed to be a valid UTF8 string, because:
        // - the input string is a valid ASCII string
        // - a reversed ASCII string is still a valid ASCII string
        // - an ASCII string is a valid UTF8 string
        String::from_utf8_unchecked(reversed_bytes)
    };

    return reversed_string;
}

You can also use the checked version, if you don't like the unsafe, but it comes with a little bit of overhead:

pub fn reverse2(input: &str) -> String {
    // Reversing on byte-level only works with ASCII strings.
    assert!(input.is_ascii());

    let reversed_bytes: Vec<u8> = input.as_bytes().iter().copied().rev().collect();
    let reversed_string = String::from_utf8(reversed_bytes).unwrap();

    return reversed_string;
}

Optimization:

Checking is_ascii() is some overhead. It is not strictly required, however.

UTF-8 has one special property: every non-ASCII byte is valued 128 and above. So technically it is enough to just simply filter out all values equal to or above 128:

pub fn reverse2(input: &str) -> String {
    let reversed_bytes: Vec<u8> = input
        .as_bytes()
        .iter()
        .rev()
        .map(|&val| {
            if val < 128 {
                val
            } else {
                0x1a // replacement char
            }
        })
        .collect();

    let reversed_string = unsafe {
        // SAFETY: This is guaranteed to be a valid UTF8 string, because:
        // - `reversed_bytes` is guaranteed to be an ASCII string
        // - an ASCII string is a valid UTF8 string
        String::from_utf8_unchecked(reversed_bytes)
    };

    return reversed_string;
}

fn main() {
    let s = "abcdefghij";
    println!("{:?}", s.as_bytes());

    let reversed = reverse2(s);
    println!("{}", reversed);
    println!("{:?}", reversed.as_bytes());
}
[97, 98, 99, 100, 101, 240, 159, 152, 131, 102, 103, 104, 105, 106]
jihgfedcba
[106, 105, 104, 103, 102, 26, 26, 26, 26, 101, 100, 99, 98, 97]

Additional remark:

Consider using .bytes() instead of .as_bytes().iter().

Finomnis
  • 18,094
  • 1
  • 20
  • 27
  • ASCII, as I remember, contains 127 symbols, not 128. Proof: c_char’s values are -128..=127. Also isn’t cp1251 still valid? – Miiao Dec 04 '22 at 12:25
  • @Miiao Yes, 127. I added `equal to or` in the text to fix that. Thanks. Everywhere else it was already correct. *"Also isn’t cp1251 still valid?"* - Valid for what? Rusts's `str` type is defined as UTF-8, so if you are talking about that, no, `str` can not be cp1251 encoded. – Finomnis Dec 04 '22 at 12:27
  • 1
    Why `unsafe`? Use `from_utf8().unwrap()`, unless there is a perf bottleneck. – Chayim Friedman Dec 04 '22 at 12:52
  • @ChayimFriedman Yes, that's exactly what I wrote directly after the code example ;) The reason I used the maximum performance version first is because I can't imagine a reason why anyone wouldn't use `chars()` except for maximum performance. – Finomnis Dec 04 '22 at 18:33
1

Well, firstly you should use .bytes() instead of .as_bytes().iter(). Secondly, you need to reverse characters, not bytes, cuz a &str may contain UTF-8, so use .chars() instead of .bytes(). Thirdly, you don’t need to collect it into a variable and return a variable, just return the result of collecting. Fourthly, you don’t need explicit return.

Let’s sum all the stuff i said:

pub fn reverse2(input: &str) -> String {
    input.chars()
         .rev()
         .collect()
}
Finomnis
  • 18,094
  • 1
  • 20
  • 27
Miiao
  • 751
  • 1
  • 8
0

Here is a solution that converts the input string into a byte vector ─ so the reverse function of Vec can be used:

pub fn reverse2(input: &str) -> String {
    let v = &mut input.to_string().into_bytes();
    v.reverse();
    std::str::from_utf8(v).unwrap().to_string()
}

The input string may only contain ASCII characters.

Playground

Kaplan
  • 2,572
  • 13
  • 14