3

Trying to use regex to parse arguments from a string: "-a 1 -b -5.1".

Output should have 2 flags with values: flag a with value 1, b with -5.1.

When I try (-(?<flag>[a-zA-Z])(?<value> .+)?(?!-[a-zA-Z]))* regular expression, it returns only flag a with value 1 -b -5.1.

Why doesn't it stop at -b?

Roman Gudkov
  • 3,503
  • 2
  • 20
  • 20
  • 1
    Why not split on space ` ` (into `{"-a", "1", "-b", "-5.1"}`) and then treat even items as names while odd as values? – Dmitry Bychenko Sep 03 '18 at 09:52
  • @DmitryBychenko Just a guess, values might contain spaces. – Sweeper Sep 03 '18 at 09:54
  • Possible duplicate of [Tempered Greedy Token - What is different about placing the dot before the negative lookahead](https://stackoverflow.com/questions/30900794/tempered-greedy-token-what-is-different-about-placing-the-dot-before-the-negat) – Sebastian Proske Sep 03 '18 at 09:55
  • @Sweeper: if value can *contain space* we have an *ambiguity*: `-a 1 -b 5.1` can be either `[{"-a", "1 -b 5.1"}]` or `[{"-a", "1"}, {"-b", "5.1"}]` – Dmitry Bychenko Sep 03 '18 at 10:06
  • 1
    @DmitryBychenko Judging from OP's attempt of regex, whenever `-[a-zA-Z]` is seen, that will be the start of a new flag. – Sweeper Sep 03 '18 at 10:08
  • 1
    @DmitryBychenko In my case, flag can be without value. That's why taking even items wouldn't work. And, as @Sweeper, mentioned `-b` should be treated as a start of new flag. – Roman Gudkov Sep 03 '18 at 10:35

1 Answers1

4

You need to make (?<value> .+) lazy and turn the negative lookahead into a positive lookahead.

Here is my try:

-(?<flag>[a-zA-Z]) (?<value>.+?)(?=$| -[a-zA-Z])

Demo

Explanation:

You are probably wondering why a positive lookahead is used instead of a negative one. This is because +? will stop matching whenever the thing after it matches. This is why we look ahead to find $| -[a-zA-Z] and if we do find one, +? stops matching!

I have also moved a space character outside of the value group. I assume you don't want the value to contain spaces?

Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • only problem is that it doesn't work for flags without values. For example, for `-a -b -c -d` this regex will return 2 matches: `-a -b` `-c -d`. While it should be `-a` `-b` `-c` `-d` – Roman Gudkov Sep 04 '18 at 08:12
  • `-(?[a-zA-Z])( (?.+?))??(?=$| -[a-zA-Z])` will correctly parse flags without values – Roman Gudkov Sep 04 '18 at 08:24
  • @RomanGudkov Right. You didn't mention that in the question, so I didn't notice. So problem solved now? – Sweeper Sep 04 '18 at 08:26
  • You are right, I didn't mention this originally - wanted to keep it short. Decided to add a comment for those who might have similar additional requirement. Solved now. – Roman Gudkov Sep 04 '18 at 08:44