Rust: Most efficient way to iterate over chars of an ASCII string

Question

My original approach:

pub fn find_the_difference(s: String, t: String) -> char {
        let mut c:u8 = 0;
        for i in 0..s.chars().count() {
            c ^= t.chars().nth(i).unwrap() as u8 ^ s.chars().nth(i).unwrap() as u8;
        }
        return (c ^ t.chars().nth(s.chars().count()).unwrap() as u8) as char;
        
    }

But it was too slow and quite crazy also all what I had to write to replace t[i] ^ s[i] (see below original C++ function). So I looked for something else and found this method where we convert the string into a char array and got some good results (went from 8ms to 0ms).

pub fn find_the_difference(s1: String, t1: String) -> char {
        let mut c:u8 = 0;
        let s: Vec<char> = s1.chars().collect();
        let t: Vec<char> = t1.chars().collect();

        for i in 0..s1.chars().count() {
            c ^= t[i] as u8 ^ s[i] as u8;
        }
        return (c ^ t[s1.chars().count()] as u8) as char;

    }

But perhaps no need to collect, nor do I care about index, I just want to iterate on one char after another. My current attempt:

pub fn find_the_difference(s1: String, t1: String) -> char {
        let mut c:u8 = 0;
        let mut s = s1.chars();
        let mut t = t1.chars();
        let n = s.count();
        
        for i in 0..n {
            c ^= t.next().unwrap() as u8 ^ s.next().unwrap() as u8; // c ^= *t++ ^ *s++ translated in C++
        }
        return (c ^ t.next().unwrap() as u8) as char;
        
    }

I get the following error message:

Line 9, Char 44: borrow of moved value: `s` (solution.rs)
   |
4  |         let mut s = s1.chars();
   |             ----- move occurs because `s` has type `std::str::Chars<'_>`, which does not implement the `Copy` trait
5  |         let mut t = t1.chars();
6  |         let n = s.count();
   |                 - value moved here
...
9 |             c ^= t.next().unwrap() as u8 ^ s.next().unwrap() as u8;
   |                                            ^ value borrowed here after move
error: aborting due to previous error

Is it possible to achieve this kind of code c = *t++ ?

NB: s1.chars.count() = t1.chars.count() - 1 and the goal is to find the extra letter in t1

NB2: original C++ function:

char findTheDifference(string s, string t) {
        char c = 0;
        for (int i = 0; t[i]; i++)
            c ^= t[i] ^ s[i];
        return c;
    }

How about `for char in str.chars() {...}`? Do you want to iterate over two strings at the same time? — Schwern, Mar 11 '21 at 21:34
Note that a Rust `String` is UTF8 and a `char` is a 1-4 byte unicode scalar value. If you want the same behaviour and performance as the C++ code then you probably want to use a `Vec` instead. — Peter Hall, Mar 11 '21 at 21:45
I tried Vec at first but there was a problem with conversion — Antonin GAVREL, Mar 11 '21 at 21:47
@PeterHall my attempt: `let s: Vec = s1.chars().collect() as Vec` — Antonin GAVREL, Mar 11 '21 at 21:59
It's unclear what the behavior of that C++ function should be when you take multibyte characters into account. (This is a common problem when translating APIs that don't make a clear distinction between characters and bytes.) You can translate it fairly straightforwardly as `fn find_the_difference(s: &[u8], t: &[u8]) -> u8`, but that might not give "interesting" results when applied to UTF-8 strings. Or you might write `fn find_the_difference(s: &str, t: &str) -> u32` (treat the characters as if UTF-32 encoded), but that is even more different with non-ASCII text. — trent, Mar 11 '21 at 22:42
sorry I should have stated that the strings input are only made of ascii characters (which you could have guessed as I cast as u8). I agree it is not a realistic function, but that's another topic. Also why the downvote? — Antonin GAVREL, Mar 12 '21 at 01:10
@darksv Thanks for the suggestion, I tried it but unfortunately got `an implementation of `std::ops::BitXor` might be missing for `std::str::Bytes<'_>`` — Antonin GAVREL, Mar 12 '21 at 01:26

score 6 · Accepted Answer · answered Mar 11 '21 at 22:53

I think you're confused about the differences between C and Rust string handling, and the distinctions between Rust's str, String, &[u8], char, and u8 types.

That said, here is how I'd implement your function:

fn find_the_difference(s: &[u8], t: &[u8]) -> u8 {
    assert!(t.len() > s.len());
    let mut c: u8 = 0;
    for i in 0..s.len() {
        c ^= s[i] ^ t[i];
    }
    c ^ t[s.len()]
}

If your data is currently String, you can get a &[u8] view of it using the as_bytes() method. Like this:

let s: String = ...some string...;
let t: String = ...some string...;

let diff = find_the_difference(s.as_bytes(), t.as_bytes());

Schwern · Answer 2 · 2021-03-11T21:55:48.257

2

zip the two iterators together.

And, as Peter Hall comments, it's safer. You can't assume characters are 1 byte. Just use !=.

fn main() {
    let a = "☃ Thiñgs";
    let b = "☃ Thiñks";

    let both = a.chars().zip(b.chars());
    
    for pair in both {
        if pair.0 != pair.1 {
            println!("Different {} {}", pair.0, pair.1);
        }
    }
}

This will stop when either iterator is exhausted.

If you want the indicies as well, use char_indicies.

Because they are a key feature of Rust, iterators are a "zero-cost abstraction" meaning Rust will do the optimization for you. Iterators are generally as fast or faster than hand-written loops.

edited Mar 11 '21 at 21:55

answered Mar 11 '21 at 21:42

Schwern

153,029
25
195
336

Sorry I forgot to add: in this specific case strings inputs are made of 1 byte chars. I really like your solution but the goal is to go through all the letters, in your example you will stop after processing "HiWo". – Antonin GAVREL Mar 11 '21 at 21:57
1

@AntoninGAVREL If you're upset about halting early when one string is shorter than the other, you should clarify your C++ code since it would exhibit an *out-of-bounds access* if the second string is longer. – kmdreko Mar 11 '21 at 23:55
You are wrong @kmdreko , the second string is longer by one character, so there is no out of bound because we xor `\0` in first string by last non-null character of second string. it is stated in my question. – Antonin GAVREL Mar 12 '21 at 00:58
@AntoninGAVREL Could you explain the full problem you're trying to solve in more detail? And if they're 1 byte your function should not pretend it works with Strings and characters, it takes u8 slices and returns a u8. Rust is not C++. – Schwern Mar 12 '21 at 01:00
https://leetcode.com/problems/find-the-difference/description/ not "my function" ;) – Antonin GAVREL Mar 12 '21 at 01:02
I am not here to discuss about the problem, nor the algorithm, only most efficient way to iterate over a string with only ASCII characters (1 byte length). – Antonin GAVREL Mar 12 '21 at 01:03
@AntoninGAVREL What happens if the second string is longer by two characters? – Don Hosek Oct 06 '22 at 03:32

Rust: Most efficient way to iterate over chars of an ASCII string

2 Answers2