2

For the following line, I need a regex to get the values outside the double quotes, namely: 0.0 and 100.5.

"VAL_ 344 PedalPos 0.0 \"% (0.0 ... 100.0)\" 100.5 \"Invalid - Undefined (100.5 ... 127.5)\";"

Using this rule Regex.Match(line, "\"\\s[0-9]+\\s\""), I am getting one group and that's the first value: 0.0. I can't figure out how to extend the search to include all the following values.

Taking into consideration this part [0-9], I think this only applies to integer values, I've added a dot there [0-9.] and this included the entire double numbers. Is this the correct way to go?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Olaru Mircea
  • 2,570
  • 26
  • 49
  • 1
    It looks like you also want to extract `344`, right? – Wiktor Stribiżew May 29 '15 at 08:04
  • @stribizhev, no .. that value is an Id. It shouldn't be added. – Olaru Mircea May 29 '15 at 08:05
  • And what is the criterion for the IDs? Always after a `VAL_`? Please clarify. – Wiktor Stribiżew May 29 '15 at 08:06
  • \s?\d+(\.\d{1,2}) this should work, tested here ... http://regexpal.com/ – Dreamweaver May 29 '15 at 08:08
  • @stribizhev a format like this : VAL_ 344 PedalPos is found on every line, VAL_ as a constant string, an integer value as the ID and the next is a string, meaning the friendly name for that ID. And after them, pairs like : 0.0 "% (0.0 ... 100.0)" or 100.5 "Invalid - Undefined (100.5 ... 127.5)" are found. – Olaru Mircea May 29 '15 at 08:09
  • 1
    In your case the following regex will work: `([\d\.]+)\s+\\"` After character escaping: `([\\d\\.]+)\\s+\\\\"` You will get two captured groups, which you can access by index. I've checked it on http://regexr.com/ – kreig May 29 '15 at 08:12

3 Answers3

1

Try "\s(\d+\.?\d*)\s" (string regex = "\"\\s(\\d+\\.?\\d*)\\s\""; in code) and take the first group's result.

Binkan Salaryman
  • 3,008
  • 1
  • 17
  • 29
1

I suggest the following approach:

1) Remove all the quoted strings,

2) Extract all numbers that are not preceded by VAL_.

var txt = "VAL_ 344 PedalPos 0.0 \"% (0.0 ... 100.0)\" 100.5 \"Invalid - Undefined (100.5 ... 127.5)\";";
txt = Regex.Replace(txt, @"""[^""]*""", string.Empty);
var results = Regex.Matches(txt, @"(?<!VAL_\s+)-?\b\d*\.?\d+\b");

Output:

enter image description here

Regex explanation:

  • "[^"]*" - Match a quoted string
  • (?<!VAL_\s+)\b\d*\.?\d+\b:
    • (?<!VAL_\s+) - A negative lookbehind to check if the number is not preceded with the constant VAL_ string and 1 or more spaces
    • \b\d*\.?\d+\b - Match a whole word that is a floating number (a bit simplified, but it will even work with .04-like values).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I hope my explanation is good. And the question is by no means a duplicate. – Wiktor Stribiżew May 29 '15 at 08:17
  • @Rawling: Downvoting valid answers is BAD. I test my answers before posting. You do not like the question? Downvote the question. – Wiktor Stribiżew May 29 '15 at 08:22
  • 1
    I am sorry that some of the important details were mentioned in the comments. Indeed i didn't pay enough attention, i was just concentrated to my values and missed that first integer after the VAL_. But this answer gives the expected result and i thank stribizhev twice, once for its formula and also for his question about the 344 value. – Olaru Mircea May 29 '15 at 08:28
  • Late question: i can't find the way to get negative values also. In this case -100.5. – Olaru Mircea May 29 '15 at 09:01
  • 1
    I have added the support for negative values by adding an optional minus sign to the regex: `@"(?<!VAL_\s+)-?\b\d*\.?\d+\b"`. – Wiktor Stribiżew May 29 '15 at 09:06
  • i've tried that -? everywhere but in front of that boundary ..thanks again. – Olaru Mircea May 29 '15 at 09:08
1

A more generic approach that uses a single expression to get the numbers you need as I understand it:

@"VAL_\s*\d+|""[^""]+""|(\d+(?:\.\d+)?)"

How is works is that it actually matches the parts you don't want first without doing much to it, and when it comes to the last part, it uses a capture group to get what you actually need. Here's a snippet for how to use it:

string text = "VAL_ 344 PedalPos 0.0 \"% (0.0 ... 100.0)\" 100.5 \"Invalid - Undefined (100.5 ... 127.5)\";";
var re = new Regex(@"VAL_\s*\d+|""[^""]+""|(\d+(?:\.\d+)?)", RegexOptions.IgnoreCase);
var textmatches = re.Matches(text);
Console.WriteLine("Result:");
foreach (Match match in textmatches)
{
    Console.WriteLine(match.Groups[1].Value);
}

ideone demo

VAL_\s*\d+ matches VAL_ followed by optional spaces and digits for the IDs,

""[^""]+"" gets all within double quotes,

(\d+(?:\.\d+)?) and finally this gets the numbers. I used a basic one, so if you have more complex numbers (negatives, scientific format, etc), you'll have to change that accordingly.

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • This ought to be more efficient than using two regex and works even if what you have between parentheses contain more than 2 numbers separated by spaces. – Jerry May 29 '15 at 10:58