1

I'm writing a parser in Rust and I'm creating tokens from a Vec<char>. Currently, my code looks like

match &source[..] {
    ['l', 'e', 't', ..] => ...,
    ['t', 'r', 'u', 'e', ..] => ...,
    _ => ...
}

Obviously this is a lot more verbose than I'd like, and not easy to read. Is there any way I can convert "let" to ['l', 'e', 't'] at compile time (with a macro or const function) in order to pattern match on it like this?

clubby789
  • 2,543
  • 4
  • 16
  • 32
  • Does this answer your question? [How to match a String against string literals?](https://stackoverflow.com/questions/25383488/how-to-match-a-string-against-string-literals) – Elias Holzmann Sep 06 '21 at 17:39
  • @EliasHolzmann No, because I’m matching against only the start of a char vec, and the patterns are of variable size. – clubby789 Sep 06 '21 at 17:45
  • Damn, you're right. Skimmed the title too quickly, sorry. – Elias Holzmann Sep 06 '21 at 17:45
  • The obvious solution seems to be to use `&str` as your token type, or maybe `String` if you want to avoid dealing with the reference lifetimes and don't mind the allocations. – Sven Marnach Sep 06 '21 at 18:27
  • or use a parsing crate like nom – Stargateur Sep 06 '21 at 18:28
  • 2
    Your problem is obviously caused by `Vec`. `char` is a completely useless type, there's no use-case for it. In particular, it doesn't represent any character. As for `Vec`, it's just a memory inefficient `String`. For a parser, `Vec` would be much more useful, and then you can use byte literals. Or just use `String` with string literals. – mcarton Sep 06 '21 at 21:47

1 Answers1

2

I don't think that you can do that with the macros from the Rust standard library, but you could write your own macro:

use proc_macro::{TokenStream, TokenTree, Group, Delimiter, Punct, Literal, Spacing};
use syn::{parse_macro_input, LitStr};

#[proc_macro]
pub fn charize(input: TokenStream) -> TokenStream {
    // some stuff for later
    let comma_token = TokenTree::Punct(Punct::new(',', Spacing::Alone));
    let rest_token_iterator = std::iter::once(TokenTree::Punct(Punct::new('.', Spacing::Joint))).chain(std::iter::once(TokenTree::Punct(Punct::new('.', Spacing::Alone))));
    
    
    let string_to_charize: String = parse_macro_input!(input as LitStr).value();
    
    let char_tokens_iterator = string_to_charize.chars().map(|char| TokenTree::Literal(Literal::character(char)));
    // if you are on nightly, Iterator::intersperse() is much cleaner than this (https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.intersperse)
    let char_tokens_interspersed_iterator = char_tokens_iterator.map(|token| [comma_token.clone(), token]).flatten().skip(1);
    let char_tokens_interspersed_with_rest_iterator = char_tokens_interspersed_iterator.chain(std::iter::once(comma_token.clone())).chain(rest_token_iterator);
    
    std::iter::once(TokenTree::Group(Group::new(Delimiter::Bracket, char_tokens_interspersed_with_rest_iterator.collect()))).collect()
    
}

Macro in action:

match &['d', 'e', 'm', 'o', 'n', 's', 't', 'r', 'a', 't', 'i', 'o', 'n'][..] {
    charize!("doesn't match") => println!("Does not match"),
    charize!("demo") => println!("It works"),
    charize!("also doesn't match") => println!("Does not match"),
    _ => panic!("Does not match")
}

Note that this is a procedural macro and as such must live in a proc_macro crate.

Elias Holzmann
  • 3,216
  • 2
  • 17
  • 33