1

I have a function that parses strings gathered from command line arguments. The function looks if the single argument it receives matches a Unicode codepoint notation (like U+20AC or a special case for non-BMP characters U-000020AC), and if it does, convert that to a char. If the argument does not look like this notation, every character in that argument is converted to char.

The function returns an Iterator<Item = char> containing all Unicode characters found in the input — regardless of how they were specified, for example, as U+20AC, U-000020AC, or .

#![feature(trait_alias)]

use std::iter;

trait CharIterator = Iterator<Item = char>;

fn to_chars(input: &str) -> impl CharIterator {
    if input.starts_with("U+") || input.starts_with("U-") {
        // A Unicode codepoint reference.
        let cp = &input[2..];
        let c = u32::from_str_radix(cp, 16)
            .ok()
            .and_then(std::char::from_u32);
        match c {
            Some(c) => Box::new(iter::once(c)) as Box<dyn CharIterator>,
            // For now, just ignore erronous input.
            _ => Box::new(iter::empty::<char>()) as Box<dyn CharIterator>,
        }
    } else {
        // Characters as-is.
        Box::new(input.chars().collect::<Vec<_>>().into_iter()) as Box<dyn CharIterator>
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn to_chars_test() {
        assert_eq!(vec!('a'), to_chars("a").collect::<Vec<_>>());
        assert_eq!(vec!('a', 'b'), to_chars("ab").collect::<Vec<_>>());
        assert_eq!(vec!('a'), to_chars("U+0061").collect::<Vec<_>>());
        assert_eq!(vec!('漢', '字'), to_chars("漢字").collect::<Vec<_>>());
        assert_eq!(vec!('漢'), to_chars("U+6F22").collect::<Vec<_>>());
        assert_eq!(None, to_chars("U+9999999").next());
    }
}

The function can return three kinds of iterator:

  • Valid Unicode codepoint notation: iterator with a single item
  • Bogus Unicode codepoint notation: empty iterator
  • Just a sequence of characters: iterator containing said characters

In Rust, I can't seem to return different Iterator implementations from within match-branches in the same match. The solution people suggest is to return a Box<dyn Trait>.

The above code works, but is it idiomatic?

Is there a more elegant way?

Some background: I'm a novice in Rust, but have experience in Java programming. In Java it is a good practice to code to an interface, not an implementation, thus returning interfaces is common.

Am I approaching programming in Rust in the wrong way by automatically trying to apply this notion?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
JeroenHoek
  • 1,448
  • 15
  • 24
  • 3
    The usual alternative is either a `Box` or to create an enum of possible iterators types. I wouldn't say either one is more Rusty than the other, but really which is the best for your use case. – Peter Hall Aug 14 '19 at 11:45
  • An enum will give callers the ability to inspect which type of iterator they've got (you might want this or you might want to hide this information), but will also be hard to express because you'll need to include the exact iterator types, which can be clunky if you use iterator combinators. – Peter Hall Aug 14 '19 at 11:47
  • The other option is to return a single iterator type, which handles all of the possibilities. Perhaps that can be done with combinators, or might need to be custom. – Peter Hall Aug 14 '19 at 11:49
  • With 'single iterator type' you mean a concrete implemntation of the `Iterator` trait? – JeroenHoek Aug 14 '19 at 11:50
  • Yes, that's what I meant. – Peter Hall Aug 14 '19 at 11:55
  • I guess I could simply return `Vec`, but since I'm using the returned iterator in a `flat_map` the iterator seems to make sense. Returning an empty or single item vector feels a bit wrong compared to `iter::empty` and `iter::once(n)`. – JeroenHoek Aug 14 '19 at 11:56

0 Answers0