I have a function that parses strings gathered from command line arguments. The function looks if the single argument it receives matches a Unicode codepoint notation (like U+20AC
or a special case for non-BMP characters U-000020AC
), and if it does, convert that to a char
. If the argument does not look like this notation, every character in that argument is converted to char
.
The function returns an Iterator<Item = char>
containing all Unicode characters found in the input — regardless of how they were specified, for example, as U+20AC
, U-000020AC
, or €
.
#![feature(trait_alias)]
use std::iter;
trait CharIterator = Iterator<Item = char>;
fn to_chars(input: &str) -> impl CharIterator {
if input.starts_with("U+") || input.starts_with("U-") {
// A Unicode codepoint reference.
let cp = &input[2..];
let c = u32::from_str_radix(cp, 16)
.ok()
.and_then(std::char::from_u32);
match c {
Some(c) => Box::new(iter::once(c)) as Box<dyn CharIterator>,
// For now, just ignore erronous input.
_ => Box::new(iter::empty::<char>()) as Box<dyn CharIterator>,
}
} else {
// Characters as-is.
Box::new(input.chars().collect::<Vec<_>>().into_iter()) as Box<dyn CharIterator>
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn to_chars_test() {
assert_eq!(vec!('a'), to_chars("a").collect::<Vec<_>>());
assert_eq!(vec!('a', 'b'), to_chars("ab").collect::<Vec<_>>());
assert_eq!(vec!('a'), to_chars("U+0061").collect::<Vec<_>>());
assert_eq!(vec!('漢', '字'), to_chars("漢字").collect::<Vec<_>>());
assert_eq!(vec!('漢'), to_chars("U+6F22").collect::<Vec<_>>());
assert_eq!(None, to_chars("U+9999999").next());
}
}
The function can return three kinds of iterator:
- Valid Unicode codepoint notation: iterator with a single item
- Bogus Unicode codepoint notation: empty iterator
- Just a sequence of characters: iterator containing said characters
In Rust, I can't seem to return different Iterator
implementations from within match
-branches in the same match
. The solution people suggest is to return a Box<dyn Trait>
.
The above code works, but is it idiomatic?
Is there a more elegant way?
Some background: I'm a novice in Rust, but have experience in Java programming. In Java it is a good practice to code to an interface, not an implementation, thus returning interfaces is common.
Am I approaching programming in Rust in the wrong way by automatically trying to apply this notion?