-1

I want to match all the words in brackets, including brackets, which are not between simple quotation marks (in .NET):

"[field1] = 'id1'" -> match "[field1]"
"[field1] = '[field2]'" -> match "[field1]"
"[ field1 ] = '[ field2 ]'" -> match "[ field1 ]"
"[ field1 ] = '[ field2 ] field3'" -> match "[ field1 ]"
"[field1] = ' [field2] ' And [field3] = '[field4] '' [field5]'" -> match "[field1]" and "[field3]"

Any suggestion would be greatly appreciated!

Szabolcs Antal
  • 877
  • 4
  • 15
  • 27

2 Answers2

0

When you need to match things "which are not between" other things, you may think to negative lookaround:

(?<!' *)\[[^\]]*\](?! *')

The matched group is $0. If I am not wrong, .NET regexps allow you to use non-fixed width expressions inside lookaround (but be aware that it is very expensive), so the regex above should do the job.

In other languages this is not allowed, but there is a workaround called "possessive quantifier" (*+):

(?<!') *+(\[[^\]]*\]) *+(?!')

The matched group is $1. This works in all the languages in which negative lookaround and possessive quantifier are permitted (example). About the meaning of *+, I suggest you my answer on the topic.

But maybe the computationally simplest expression is:

[^'] *+(\[[^\]]*\]) *+[^']

The matched group is $1.


Update

This is a kind of heuristics:

(?<=^[^'\n\r]*(('[^'\n\r]*){2})*)\[[^\]]*\]

Or, maybe faster:

\[[^\]]*\](?=[^'\n\r]*(('[^'\n\r]*){2})*$)

The matched group is $0. Use it with multiline mode (m flag).

It will work if apexes inside the fields are not allowed.

Community
  • 1
  • 1
logi-kal
  • 7,107
  • 6
  • 31
  • 43
0

As @horcrux also pointed out, you can use negative lookbehind and lookahead. Since apparently C# supports infinite repetition, you could get away with:

"(?<!(')\s*)" + "\[[^\[\]]+\]" + "(?!\s*$1)

Separated the three parts, the negative lookbehind, the capturing match and the negative lookahead.

Note also that since the lookarounds are non-capturing, you don't even need parenthesis around the middle part.

In Java this would be something like:

"(?<!(')\s{0,1000})" + "\[[^\[\]]+\]" + "(?!\s{0,1000}$1)

but it's kinda' hacky, as it would only work for 1000 spaces between the two.