3

Can anyone suggest a regex to match the underscore in the following examples:

test_test
test[_test
test_]

But NOT match this:

test[_]test

This is using the .Net Regular Expression library. I'm using this RegEx tester to check:

http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

Chad Birch
  • 73,098
  • 23
  • 151
  • 149
Moskie
  • 1,277
  • 2
  • 16
  • 23

5 Answers5

5

Try this:

_[^\]]|[^[]_

It consists of an alternation of _[^\]] (underscore and not ]) and [^[]_ (not [ and underscore).

Or if you want to use look-around assertions to really match just the underscore and not surrounding characters:

_(?=[^\]])|_(?<=[^[]_)

This matches any underscore that is not followed by a ] ((?=[^\]]), positive look-ahead) or any underscore that is not preceded by a [ ((?<=[^[]_), negative look-behind). And this can be combined to:

_(?:(?=[^\]])|(?<=[^[]_))
Gumbo
  • 643,351
  • 109
  • 780
  • 844
2
_(?!\](?<=\[_\]))

If the underscore isn't followed by a closing bracket, the negative lookahead succeeds immediately. Otherwise, it does a lookbehind to find out if the underscore is also preceded by an opening bracket. You can replace the "_]" with dots to make it clear that you're only interested in the opening bracket this time:

_(?!\](?<=\[..))

You can do the lookbehind first if you want:

_(?<!\[_(?=\]))

The important thing is that the second lookaround has to be nested within the first one in order to achieve the "NOT (x AND y)" semantics.

Testing it in EditPad Pro, it matches the underscore in all but the last of these strings:

test_test
test[_test
test_]
_]Test
Test[_
test[_]test

EDIT: here's an easier-to-read version:

(?<!\[)_|_(?!\])

What I like about the nested-lookaround version is that it doesn't do anything until it actually finds an underscore. Unless the regex engine is smart enough optimize it away, this "(NOT x) OR (NOT y)" version will do a negative lookbehind at every single position.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
1

I don't know about .Net but the regex would be composed of two parts, one matching any character except a bracket followed by an underscore and the other one vice versa:

[^\[](_)|(_)[^\]]

Edit: Just noticed that you need to add the cases where the underscore is in the beginning or the end:

[^\[](_)|(_)[^\]]|^_|_$
soulmerge
  • 73,842
  • 19
  • 118
  • 155
  • Didn't work for : test[_test (I'm using the RegEx tester here to test: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx) – Moskie Mar 24 '09 at 17:11
  • hi Moskie.. without any options checked.. this regex matches underscore in the "test[_test" string – neoneye Mar 24 '09 at 17:16
1

((?|(?!]))

which uses negative lookahead/behind (rather than positive lookahead/behind and excluded characters).

codybartfast
  • 7,323
  • 3
  • 21
  • 25
0

Try

^.*(\[_[^\]])|([^\[]_\])|([^\[]_[^\]]).*$

EDIT: Now handles

test_test

Not tested, but read: Any string of characters followed by either [_ then any character but ] or any character but [ then _]

Note, this might fail for cases like

_]Test
Test[_

I don't know if that's a problem for you?

Tested successfully with all your examples

Mark Pim
  • 9,898
  • 7
  • 40
  • 59