0

I have the following regex:

'[^']*'(*SKIP)(F)|\b[_A-Za-z]\w\b(?![(']).

and it works fine as expected (select variables from expression). But when I try to use it in .NET:

private string regex = @"'[^']*'(*SKIP)(*F)|\b[_A-Za-z]\w*\b(?![('])";
private string _expression = @"12+x1+455+'ggg+4+rrr+tt'+3"

var matches = Regex.Matches(_expression, regex);

it does not find anything. I suppose, it's specific to use SKIP in Regex of .NET class.

Oleg Sh
  • 8,496
  • 17
  • 89
  • 159

2 Answers2

1

In .NET and most other flavors, match and capture what you need and only match what you do not need:

'[^']*'|\b([_A-Za-z]\w*)\b(?![('])

See the regex demo

C# demo:

var regex = @"'[^']*'|\b([_A-Za-z]\w*)\b(?![('])";
var _expression = @"12+x1+455+'ggg+4+rrr+tt'+3";
var matches = Regex.Matches(_expression, regex)
       .Cast<Match>()
       .Select(m => m.Groups[1].Value)
       .ToList();

Alternatively, use PCRE.NET.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • How to exclude 'ggg+4+rrr+tt' from result? – Oleg Sh Dec 10 '16 at 20:02
  • Have you seen the demo? It is all there. `.Groups[1].Value` holds what you need. – Wiktor Stribiżew Dec 10 '16 at 20:02
  • I don't need 'ggg+4+rrr+tt' in result at all :) – Oleg Sh Dec 10 '16 at 20:05
  • The *result* is what you get from what regex fetches. You need specific part of text that you *capture*. Else, explain what you are doing. Right now, that sounds as an XY problem. – Wiktor Stribiżew Dec 10 '16 at 20:08
  • compare please https://www.regex101.com/r/GekTga/1 and https://www.regex101.com/r/FtwAkx/1 :) Do you see a difference? '...' included to result in first and excluded in second:) – Oleg Sh Dec 10 '16 at 20:16
  • 1
    I do not care about what regex testers say, the main point is what you get with both regex and *code*. Do you see http://ideone.com/Zyfsaf showing only `x1`? **There is no SKIP-FAIL equivalent in .NET regex flavor**. That is why I suggested using [PCRE.NET](https://www.nuget.org/packages/PCRE.NET). You may use ugly workarounds like `\b([_A-Za-z]\w*)\b(?![('])(?=(?:[^']*'[^']*')*[^']*$)` but I do not guarantee this won't cause catastrophic backtracking one day. – Wiktor Stribiżew Dec 10 '16 at 20:18
1

(*SKIP)(*F) are PCRE specific verbs.

Alternatively to only match outside single quotes look ahead for an even (balanced) '...' amount.

\b[_A-Za-z]\w*\b(?![('])(?=(?:[^']*'[^']*')*[^']*$)

See demo at regexstorm

There are quite some similar answers available already: @vks, @MarkusQ

Community
  • 1
  • 1
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • I am sorry, but this regex will slow down your program. It is *not* a good solution, and - more - it is not a good workaround either. – Wiktor Stribiżew Dec 10 '16 at 21:54
  • @WiktorStribiżew That is just an empty statement. To use this regex on input similar to the samples length won't cause any issues at all (: Of course if the input is large there will be limits. – bobble bubble Dec 10 '16 at 22:15