0

I have the following string

[1] weight | width | depth | 5.0 cm | 6.0 mm^2 | 10.12 cm^3

From that I need to extract the name, value and units from the above string like below

name = weight
value = 5.0
unit = cm

name = width
value = 6.0
unit = cm^2

name = depth
value = 10.12
unit = cm^3

I have the following regexes for each match cases. Individually each one is working as expected. But combining the regex is needed, so it will return the expected match cases. I tried just combining them all and also using |. But not worked. Here is the working regex for individual matches

For Name : (?<name>\b\w+(?:[\w]\w+)+\b)
For Value : (?<![\^])(?<value>[+-]?[0-9]+(?:\.[0-9]+)?)(?!\S)
For Unit : \b[0-9]+(?:\.[0-9]+)?[^\S\r\n]+(?<unit>[^0-9\s]\S*)(?:[^\S\r\n]+\||$)

Can anyone help me on this. Thanks

Artyom Vancyan
  • 5,029
  • 3
  • 12
  • 34
Aneesh Narayanan
  • 3,220
  • 11
  • 31
  • 48

3 Answers3

4

If there are the same amount of pipes, you can use a capture group for name, and capture value and unit in a lookahead:

(?<!\S)(?<name>\w+)(?=(?:[^|]*\|){3}\s*\b(?<value>[0-9]+(?:\.[0-9]+)?)\s+(?<unit>[^0-9\s]\S*))

Regex demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • To make this work for OP I think we need to think of the `{3}` as being `{n}` rather where OP would find the occurence of the '|' in the string first and devide by 2 (round up to 1st integer in case of just a single pipe). – JvdV Jun 23 '22 at 07:28
  • @JvdV Yes if you have 2 occurrences of the pipe, then the n will be 2. If this is dynamic, you can use logic like you described, or use split and do afterprocessing. – The fourth bird Jun 23 '22 at 07:53
  • Yeah, looking at the samples OP provided (in question and comment) the latter (split) may also be a good alternative. – JvdV Jun 23 '22 at 08:16
  • Thanks for the answers. The format is always same. But the data group count may differ. The string I provided in the question has 3 groups. But the group count may differ. It should support any group count. Like below [1] ABC | XYZ | 6.9 mm | 194 mm^3 – Aneesh Narayanan Jun 23 '22 at 08:38
  • @Thefourthbird some more sample data ''[1] ABC | XYZ | 6.9 mm | 194 mm^3" ''[2] EFG | 6.9 mm" – Aneesh Narayanan Jun 23 '22 at 09:12
  • TheFourthBird, I've taken the liberty to upload an answer to showcase your pattern. If you decide to update your answer with this (or probably more streamlined code) I'll make sure to take mine down. – JvdV Jun 23 '22 at 09:34
  • @JvdV That is perfectly ok, I was just boarding an airplane and was offline for a while. – The fourth bird Jun 23 '22 at 16:23
  • 1
    Enjoy wherever you are! Not too much SO during holidays OK =) – JvdV Jun 23 '22 at 17:02
  • @JvdV Too late :-) I just got back – The fourth bird Jun 23 '22 at 17:02
2

Just for reference on how you could use the pattern provided by @TheFourthBird

using System;
using System.Text.RegularExpressions;
using System.Linq;
                    
public class Program
{
    public static void Main()
    {
        string s = "[1] weight | width | depth | 5.0 cm | 6.0 mm^2 | 10.12 cm^3";
        int n = s.Split('|').Length / 2;
        string pat = @"(?<!\S)(?<name>\w+)(?=(?:[^|]*\|){" + n + @"}\s*\b(?<value>[0-9]+(?:\.[0-9]+)?)\s+(?<unit>[^0-9\s]\S*))";
        
        var ItemRegex = new Regex(pat, RegexOptions.Compiled);
        var OrderList = ItemRegex.Matches(s)
                            .Cast<Match>()
                            .Select(m => new
                            {
                                Name = m.Groups["name"].ToString(),
                                Value = Convert.ToDouble(m.Groups["value"].ToString()),
                                Unit = m.Groups["unit"].ToString(),
                            })
                            .ToList();
        Console.WriteLine(String.Join("; ", OrderList));
    }
}

Prints:

{ Name = weight, Value = 5, Unit = cm }; { Name = width, Value = 6, Unit = mm^2 }; { Name = depth, Value = 10.12, Unit = cm^3 }

Give it a go with other samples here


Note: By no means am I an c# developer. I just so happen to adjust code found here on SO to showcase how the answer given by TheFourthBird could work.

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • Thanks for sharing the code. I already have a logic for parsing the match groups(That is fixed and cannot change at this moment). And in that the regex is reading from a json file. So I can easily change the regex for supporting the text string. So is it possible to modify the regex without explicitly specifying the value of 'n' – Aneesh Narayanan Jun 23 '22 at 10:54
  • @AneeshNarayanan, not possible through regex afaik. – JvdV Jun 23 '22 at 12:28
  • Hm. Thanks for the update. I was thinking that, if the match grouping is working individually without specifying the 'n' value, combining them in any way will work ? – Aneesh Narayanan Jun 23 '22 at 12:44
1

Use this regex to capture the corresponding groups

\[\d+\]\s(\w+)\s\|\s(\w+)\s\|\s(\w+)\s\|\s(\S+)\s(\S+)\s\|\s(\S+)\s(\S+)\s\|\s(\S+)\s(\S+)

Then using substitution replace with

name = $1\nvalue = $4\nunit = $5\n\nname = $2\nvalue = $6\nunit = $7\n\nname = $3\nvalue = $8\nunit = $9

See the regex demo. Also, see C# demo.

Artyom Vancyan
  • 5,029
  • 3
  • 12
  • 34