1

I'm trying to figure out a quite simple regular expression, but I cannot reconstruct, why it does not work. I thought I'm into the regex stuff, but unfortunately it doesn't seem so :D

Here is the expression I want to match:

interval=4|termination=2012-09-18 22:00:00|days=3

By that, I want to have a matching array that looks anyhow like this

match = array({"interval" => "4", "termination" => "2012-09-18 22:00:00", "days" = "3"});
//(pseudocode)

I'm using it in C#, for that I want to have pattern names. I tried it with this pattern:

(.*)((termination=(?<termination>(.{19})))|(interval=(?<interval>(\d*)))|(days=(?<days>(\d*))))*(.*)

Can anybody point out where I fail?

Thx in advance

Reblochon Masque
  • 35,405
  • 10
  • 55
  • 80
Giehl Man
  • 410
  • 5
  • 13
  • 1
    Have you tested it using a regex tool? Many tools decompose your resulting matches (or missing matches) which can help. I use the free RAD Regex Designer (http://www.radsoftware.com.au/regexdesigner/), but there are others. – Arjan Einbu Feb 12 '13 at 12:31
  • 1
    I can recommend regex101.com – David S. Feb 12 '13 at 12:34

3 Answers3

3

I believe you are coming from PHP background, you can use string.Split and a force the output to a dictionary like:

string str = "interval=4|termination=2012-09-18 22:00:00|days=3";
Dictionary<string,string> dict = str.Split('|')
                                    .Select(r => r.Split('='))
                                    .ToDictionary(t => t[0], t=> t[1]);

and the output would be:

enter image description here

Habib
  • 219,104
  • 29
  • 407
  • 436
  • Thank you for that answer, that is certainly one way to do it in this case, but I'm seeking a regex approach to build up more sophisticated usages. – Giehl Man Feb 12 '13 at 12:34
1

| is special character for regex, which means alternation. Since you want to match literal |, you need to escape it.

interval=(?<interval>\d*)\|termination=(?<termination>.{19})\|days=(?<days>\d*)

I have also take the liberty to clean up the capturing groups that you don't seem to need. I have also modified the regex, so that it works with Regex.Matches() method.

I assume that the input appears in the order specified.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • To OP: I am not sure what you are trying when you use `|` in your regex. If the input comes in the same order as shown, then the regex I wrote will work. Otherwise, it needs more modification. – nhahtdh Feb 12 '13 at 12:36
  • Hi! Thanks for that suggestion. Yeah, I want to be insensitive to the order. – Giehl Man Feb 12 '13 at 12:41
1

What are the results you're getting? I'm betting that (.*), being greedy, will consume the whole string, while the other parts (suffixed by *) will be matched zero times. So the match will succeed, but the capturing groups will be empty. Is that what you're experiencing?

My suggestion would be to go with Split as suggested by Habib, but if you want to fix your regex then:

  • Make the first group non-greedy (lazy): (.*?)
  • Fix the order of your fields, and escape | as suggested by nhahtdh, or:
  • If the fields can come out of order, you might need to repeat them to accept zero, one or more (not the best job for a regex, but doable):

    (
        (termination=(...)|interval=(...)|days=(...))
        (\| (termination=(...)|interval=(...)|days=(...)) )*
    )?
    

    (spaces and newlines added for readability)

mgibsonbr
  • 21,755
  • 7
  • 70
  • 112
  • That is exactly what I'm experiencing. Tons of empty arrays, I guess there is some error in my thinking. Another example I want to get working is the processing of program parameters at the invocation of my cli-program (e.g. start.exe /help /param1 5 /param2 "hello") – Giehl Man Feb 12 '13 at 12:33
  • 1
    @GiehlMan While it's nice to have a good grasp of regexes, I tend to only use them when simpler alternatives aren't in place. For parsing command-line arguments, the preferred approach AFAIK is using [`OptionSet`](http://stackoverflow.com/q/491595/520779). But if what you want is to learn, check my comments and link above about greedy vs. lazy quantifiers, and you might also be interested in [non capturing groups](http://stackoverflow.com/q/3512471/520779) (to make your results cleaner). Using `^` and `$` to force a full string match is also helpful. – mgibsonbr Feb 12 '13 at 12:46