2

Take this data as an example:

ID: JK546|Guitar: 0|Piano: 1|Violin: 0|Expiry: Aug14,2021

I was wondering if it's possible to create a regex that will return this set of matches

ID: JK546|Guitar: 0|Expiry: Aug14,2021
ID: JK546|Piano: 1|Expiry: Aug14,2021
ID: JK546|Violin: 0|Expiry: Aug14,2021

I did try creating one below:

ID: (?<id>\w+).*\|(?<instrument>\w+):\s(?<count>\d).*Expiry:\s(?<expiry>[\w\d]+)

but it only returned the one with the violin instrument. I would highly appreciate your insights on this.

Antoine Dubuis
  • 4,974
  • 1
  • 15
  • 29
MK Span
  • 65
  • 8

4 Answers4

1

I would not use a regular expression. Especially since the string ID: JK546|Guitar: 0|Expiry: Aug14,2021 does not appear in the string ID: JK546|Guitar: 0|Piano: 1|Violin: 0|Expiry: Aug14,2021, so it's not strictly a match, but more of a replacement. But there's no good way to get all replacements from all matches.

So, I'd just split the input string on |.

Then you want to compose a result string that is comprised of the first field, one of the middle fields, and the last field. You'll get one result for each middle field that exists. If it splits into N fields, you'll get N-2 results. e.g.: if it splits into 5 fields, then you'll get 3 results, one for each of the "middle" fields.

string input = "ID: JK546|Guitar: 0|Piano: 1|Violin: 0|Expiry: Aug14,2021";
string[] fields = input.Split('|');
for( int i = 1; i < fields.Length - 1; ++i) {
    string result = string.Join("|", fields.First(), fields[i], fields.Last());
    Console.WriteLine(result);
}

output:

ID: JK546|Guitar: 0|Expiry: Aug14,2021
ID: JK546|Piano: 1|Expiry: Aug14,2021
ID: JK546|Violin: 0|Expiry: Aug14,2021
Wyck
  • 10,311
  • 6
  • 39
  • 60
0

A single regular expression to return multiple matches on multiple calls?  I wonder whether that is possible.

I’m not familiar with how to do regex processing in C#, but this sed command will do what you want.  Perhaps you can understand how it works and adapt it to your needs:

sed -n ':loop; h; s/^\([^|]*|[^|]*\).*\(|.*\)$/\1\2/p; g; s/^\([^|]*\)|[^|]*\(|.*\)$/\1\2/; t loop'

For simplicity, let’s pretend that the input string is “A|B|C|D|E”.

What it does:

  • -n is the option to tell sed not to print anything automatically (but only print when told to, with a p command).
  • :loop is a label for, effectively, a “goto”.  So use a while loop structure.
  • h saves the pattern space into the hold space.  In other words, make a copy of your string.
  • s/^\([^|]*|[^|]*\).*\(|.*\)$/\1\2/p captures the first two segments and the last one, and prints the result.  So “A|B|C|D|E” becomes “A|B|E” (i.e., your first desired output).
  • g restores the saved string from the hold space into the pattern space.  In other words, retrieve the copy of the string that you saved.
  • s/^\([^|]*\)|[^|]*\(|.*\)$/\1\2/ captures the first segment, skips the second, and then captures the rest.  So “A|B|C|D|E” becomes “A|C|D|E”.
  • t loop is the “goto” command.  It says to go back to the beginning of the loop if the most recent substitution succeeded.  In other words, this is the end of the loop, and the specification of the loop condition.

The second iteration of the loop will change “A|C|D|E” to “A|C|E” and print it.  And then change “A|C|D|E” to “A|D|E” and iterate.  The third iteration of the loop will change “A|D|E” to “A|D|E” and print it.  (Obviously there is no change, because the .* in the middle of the regex matches the zero-length string between “A|D” and “|E”.)  The final substitution changes “A|D|E” to “A|E”, and then there is nothing left to find.

0

You can make use of the .NET Groups.Captures property to get the values of Guitar, Piano and Violin.

(ID: \w+\|)(\w+: \d+\|)+(Expiry: \w+,\d+)

The pattern matches:

  • (ID: \w+\|) Capture group 1 match ID: 1+ word chars and |
  • (\w+: \d+\|)+ Capture group 2 Repeat 1+ times matching 1+ word chars : 1+ digits |
  • (Expiry: \w+,\d+) Capture group 3 match Expiry: 1+ word chars , and 1+ digits

enter image description here

See a .NET regex demo | C# demo

For example

var str = "ID: JK546|Guitar: 0|Piano: 1|Violin: 0|Expiry: Aug14,2021";
string pattern = @"(ID: \w+\|)(\w+: \d+\|)+(Expiry: \w+,\d+)";
Match m = Regex.Match(str, pattern);

foreach(Capture c in  m.Groups[2].Captures) {
    Console.WriteLine(m.Groups[1].Value + c.Value + m.Groups[3].Value);
}

Output

ID: JK546|Guitar: 0|Expiry: Aug14,2021
ID: JK546|Piano: 1|Expiry: Aug14,2021
ID: JK546|Violin: 0|Expiry: Aug14,2021
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

It should be possible with look behind and look ahead:

string foo = @"ID: JK546 | Guitar: 0 | Piano: 1 | Violin: 0 | Expiry: Aug14,2021";

// First look at "Guitar: 0", "Piano: 1" and "Violin: 0". Then look behind "(?<= )" and search for the ID. Then look ahead "(?= )" and search for Expiry.

string pattern = @"(\w+: \d)(?<=(ID: [A-Z0-9]+).*?)(?=.*?(Expiry: \S+))";

foreach (var match in Regex.Matches(foo, pattern))
{
    ....                
}

Fortunately c# is one of the few languages that can handle variable length look behinds.

Klamsi
  • 846
  • 5
  • 16