0

The following pattern matches a line that starts with 'v' followed by an arbitrary number of floats:

    const RegexOptions options = RegexOptions.Compiled | RegexOptions.Singleline | RegexOptions.CultureInvariant;

    var regex = new Regex(@"^\s*v((?:\s+)[-+]?\b\d*\.?\d+\b)+$", options);

    const string text = @"
v +0.5 +0.5 +0.5 0.0 1.0 1.0
v +0.5 -0.5 -0.5 1.0 0.0 1.0
v -0.5 +0.5 -0.5 1.0 1.0 0.0
v -0.5 -0.5 +0.5 0.0 0.0 0.0
";

    using var reader = new StringReader(text);

    for (var s = reader.ReadLine(); s != null; s = reader.ReadLine())
    {
        if (string.IsNullOrWhiteSpace(s))
            continue;

        var match = regex.Match(s);

        if (match.Success)
        {
            foreach (Capture capture in match.Groups[1].Captures)
            {
                Console.WriteLine($"'{capture.Value}'");
            }
        }
    }

It works as expected except that it includes the leading space before a number:

' +0.5'
' +0.5'
' +0.5'
' 0.0'
' 1.0'
' 1.0'
...

Question:

How can I ignore the leading space for each captured number?

aybe
  • 15,516
  • 9
  • 57
  • 105
  • Is your filestructure always valid, or are there lines that won't match your desired pattern? Ie, is the is it just about getting out all numbers from a valid file or do you need a validity check and want to ignore invalid lines? – derpirscher Nov 20 '21 at 13:23
  • There shouldn't be wrong content but oneself protecting is always nice to have. – aybe Nov 20 '21 at 13:26
  • So what happens if you encounter a line like `v 1.0 xy 1.0` completely ignore it? – derpirscher Nov 20 '21 at 13:26
  • Yes, this wouldn't be valid, actually it's a Wavefront Obj file. – aybe Nov 20 '21 at 13:26

2 Answers2

2

You could change the regex to match the whitespace chars instead of capturing.

This part (?:\s+) is the same as just \s+ and as you repeat the pattern with 1 or more whitspace chars you can omit the word boundary \b at the end.

Note that in C# \d can match more than [0-9]

^\s*v(?:\s+([-+]?\b\d*\.?\d+))+$

The line in C# would be:

var regex = new Regex(@"^\s*v(?:\s+([-+]?\b\d*\.?\d+))+$", options);

Output

'+0.5'
'+0.5'
'+0.5'
'0.0'
'1.0'
'1.0'
'+0.5'
'-0.5'
'-0.5'
'1.0'
'0.0'
'1.0'
'-0.5'
'+0.5'
'-0.5'
'1.0'
'1.0'
'0.0'
'-0.5'
'-0.5'
'+0.5'
'0.0'
'0.0'
'0.0'
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    So basically with your last edit, you removed the last `\b` because it was useless, correct? – aybe Nov 20 '21 at 13:24
0

You might be overcomplicating this. I suggest just using the following regex pattern:

[+-]?\d+(?:\.\d+)?

Your updated C# code:

var regex = new Regex(@"[+-]?\d+(?:\.\d+)?", options);
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360