0

Thanks to Wiktor Stribiżew for answering my first question on extracting data from paired braces & pointing out the duplicate question : Regex to get string between curly braces

What I would like to do is also extract data from a string between arbitrary pairs of multicharacter delimiters, for example <= & =>. If there are multiple delimiters then I only want the data inside the inner most pair. I have tried varitions on the example Wiktor showed me but have only succeeded with single character delimiters, so trying again with my original lookbehind/lookahead solution, with the same problem - the lookahead correctly stops on the first match, the lookbehind only stops at the very first occurrence not the first match.

I have tried adding negative lookbehinds for additional braces & limiting the number of matches with {1} inside the lookbehind expression but neither of these work. I have found various comments on the difficulty of implementing lookbehind for regex & that not all regex implementations support all features - is there anyway to get the lookbehind to stop after the first match with C# (.Net) Regex.

The following code shows the regex I am using & what I am expecting & what I am getting.

var reg = new Regex(@"(?<=<=).*?(?==>)");
var matched = reg.Matches("I want to get <=data=> between paired <=<=<=delimiters=>=>=>=>");
foreach(var m in matched)
{
   Console.WriteLine(m.ToString());
}

The result I expect is

data
delimiters

what I am getting is

data
<=<=delimiters
PaulF
  • 6,673
  • 2
  • 18
  • 29
  • Use `<=((?:(?!<=).)*?)=>` ([demo](http://regexstorm.net/tester?p=%3c%3d%28%28%3f%3a%28%3f!%3c%3d%29.%29*%3f%29%3d%3e&i=I+want+to+get+%3c%3ddata%3d%3e+between+paired+%3c%3d%3c%3d%3c%3ddelimiters%3d%3e%3d%3e%3d%3e%3d%3e)) – Wiktor Stribiżew Aug 20 '20 at 15:56
  • Unfortunately that retains the delimiters returning "<=data=>" & <=delimiters=>" = I just want "data" & "delimiters" - I could trim them after, but would prefer not to. – PaulF Aug 20 '20 at 15:59
  • You are using C#, right? So use it. `var results = Regex.Matches(text, regex).Cast().Select(x => x.Groups[1].Value).ToList();` – Wiktor Stribiżew Aug 20 '20 at 16:00
  • You capture them. Get Group 1 contents. That is not making it different. The main concept is the tempered greedy token, that is what you need. – Wiktor Stribiżew Aug 20 '20 at 16:01
  • Thanks - I didn't spot the Linq select was getting the Group[1] values rather than the Group[0] values. – PaulF Aug 20 '20 at 16:05
  • I just do not like the `(?<=<=)(?:(?!<=).)*?(?==>)` appearance. It looks rather misleading. – Wiktor Stribiżew Aug 20 '20 at 16:07
  • I agree those delimiters look odd - but "(?<=STX )(?:(?!STX ).)*?(?= ETX)" isn't that bad - it was the (?: ...) I was missing - again thanks for the help. – PaulF Aug 20 '20 at 16:15
  • You can learn more about it [here](https://stackoverflow.com/a/37343088/3832970). – Wiktor Stribiżew Aug 20 '20 at 16:17

0 Answers0