0

I have an enum in Rust like:

enum List {
    Rather,
    Long,
    List,
    Of,
    Items,
}

And I have a &str which can look like:

let a = "InMyList"
let b = "InMyRather"
let c = "NotContain"

Is there an efficinet way of doing a.contains(List)?

I can go through each enum item and do a.contains(List::Rather) || a.contains(List::Long) etc. But if the list is long enough, that's a lot of boilerplate code!

ohboy21
  • 4,259
  • 8
  • 38
  • 66
  • Does this answer your question? [Can I convert a string to enum without macros in Rust?](https://stackoverflow.com/questions/39070244/can-i-convert-a-string-to-enum-without-macros-in-rust) – joshmeranda Jul 17 '21 at 05:14
  • Not really. The &str can contain one enum variant but doesn't have to. I want to go through each enum and check if it's part of the &str. – ohboy21 Jul 17 '21 at 05:15
  • This looks like it might be useful to you: [strum crate](https://crates.io/crates/strum). With that you could iterate over the variant names, piecing together a regular expression you can use to scan strings with. – Todd Jul 17 '21 at 07:33
  • I put together an example on how to efficiently match strings and convert strings to variants, etc. below. – Todd Jul 18 '21 at 00:27

2 Answers2

2

As you said:

I can go through each enum item and do a.contains(List::Rather) || a.contains(List::Long) etc. But if the list is long enough, that's a lot of boilerplate code!

So in order to avoid typing many "OR"-checks, which is not only tedious but also error prone, you simply need a way to loop through all the enum variants. Please take a look at the strum crate and also this discussion.

Note: If optimal performance is your goal, regex can be a faster solution than simply looping over the enum variants. Please see Todd's answer and the comments that follow it.

at54321
  • 8,726
  • 26
  • 46
2

It'll be more efficient to do the string matching with a regular expression than iterating over each variant name and trying to find it as a substring in the target strings. I put together an example of how this can be done using the strum and regex crates.

The enum with a long list of variants, and a set of target strings to search.

use std::str::FromStr;
use lazy_static::lazy_static;
use regex::Regex;
use strum::{EnumString, EnumVariantNames, VariantNames};

#[derive(EnumString, EnumVariantNames, Debug)]
enum Thing {
    Foo,
    Bar,
    Baz,
    Qux,
    Quux,
    Quuz,
    Corge,
    Grault,
    Garply,
    Waldo,
    Fred,
    Plugh,
    Xyzzy,
    Thud,
}

// Target strings to look for variant names in.
//
static STRINGS: [&str; 10] = [
    "fdskQuuzjfkds", "fkjdFoo", "Fred", "fkdXyzzy", "Plughfkdjs",
    "QuuxQuux", "GraultGarply", "TTTThud", "CCorgee", "Waldo",
];

Setting up the regular expression as a static that only needs to be compiled once at runtime. It's constructed from the names of the variants as provided by the strum crate's feature.

lazy_static! {

    // A Regular Expression used to find variant names in target strings. 
    //
    static ref THING_EXPR: Regex = {
    
        // Piece together the expression from Thing's variant names.
        let expr_str = Thing::VARIANTS.join("|");
        
        Regex::new(&expr_str).unwrap()  
    };
}

Example code showing how the set of strings can be iterated over and variants detected and handled. The variant name is first captured then extracted from the expression result, then the enum's variant of the same name is retrieved using the string.

fn main() {
    
    for target in STRINGS {
        if let Some(captures) = THING_EXPR.captures(target) {
        
            // Get the substring that matched one of the variants.
            let variant_name = &captures[0];
            
            // Convert the string to the actual variant.
            let variant = Thing::from_str(variant_name).unwrap();
            
            println!("variant name: {:<8} --  variant: {:?}", 
                     variant_name, variant);
        }                                     
    }
}

Add these dependencies to the Cargo.toml file:

[dependencies]
lazy_static = "1.4"
strum = { version = "0.21", features = ["derive"] }
regex = "1.5"

output:

variant name: Quuz     --  variant: Quuz
variant name: Foo      --  variant: Foo
variant name: Fred     --  variant: Fred
variant name: Xyzzy    --  variant: Xyzzy
variant name: Plugh    --  variant: Plugh
variant name: Quux     --  variant: Quux
variant name: Grault   --  variant: Grault
variant name: Thud     --  variant: Thud
variant name: Corge    --  variant: Corge
variant name: Waldo    --  variant: Waldo
Todd
  • 4,669
  • 1
  • 22
  • 30
  • There are moments when regular expressions can be very useful, but they are not panacea. Why do you say your solution is more efficient? More efficient how? If you mean faster, I believe one gigantic union-regex will probably be slower than a loop over strings (in theory, a trie-regex *might* perform better, but that's another story). And in terms of code complexity, I don't think a loop-based solution would be any worse than a regex. – at54321 Jul 19 '21 at 14:48
  • @at54321, I put together a timeit test for each approach - [code here](https://gist.github.com/ttappr/4e03273457823be7406c4c51bf2efd29). Running with `cargo run --release` I'm seeing the regex approach is about 5x faster than the looping approach on my system. The debug build shows opposite results. – Todd Jul 19 '21 at 19:48
  • Todd, This is interesting. I ran your tests and on my environment the difference was smaller, but still in favor of the regex. I then tested with 100 enum variants and the difference became much bigger: more than 20 times. Clearly, the regex does something much smarter than what I was expecting. I'll edit my answer. Thanks! – at54321 Jul 19 '21 at 21:18
  • @at54321, this is pure speculation, but I believe the advantage the compiled regex has over the `.contains()` operation is the `.contains()` may scan the entire string, or more of it. It may not give up even when there aren't enough characters left to satisfy a match, and it'll do that for every variant. While the regex scans it's trying to satisfy each branch representing a variant "in parallel" and knows when to stop earlier. – Todd Jul 20 '21 at 00:23
  • My tests showed that increasing the number of enum variants leads to roughly linear slowdown in the loop-based solution (which is totally understandable, of course), whereas the regex solution showed almost no slowdown going from 10 to 100. That proves it's more about regex being efficient than `.contains()` being less-than-optimal. I suppose the regex is compiled to a "trie" or something like that. – at54321 Jul 20 '21 at 07:06
  • I did a little digging which ultimately led me to this article on [Thompson's construction](https://en.wikipedia.org/wiki/Thompson%27s_construction). The regex README mentions [RE2](https://github.com/google/re2/wiki/WhyRE2) as the inspiration for its design, which lists some interesting links to other articles (at the bottom of last link). "Tompson NFA" is brought up near the start of the first article listed. I understand this to be a large influence on `regex`'s design/implementation along with other articles listed on the RE2 link. – Todd Jul 24 '21 at 00:32