11

Okay, so I'm trying to use a regular expression to match instances of a character only if it hasn't been escaped (with a backslash) and decided to use the a negative look-behind like so:

(?<!\\)[*]

This succeeds and fails as expected with strings such as foo* and foo\* respectively.

However, it doesn't work for strings such as foo\\*, i.e - where the special character is preceded by a back-slash escaping another back-slash (an escape sequence that is itself escaped).

Is it possible to use a negative look-behind (or some other technique) to skip special characters only if they are preceded by an odd number of back-slashes?

David Faber
  • 12,277
  • 2
  • 29
  • 40
Haravikk
  • 3,109
  • 1
  • 33
  • 46
  • 2
    Which lang you're running? – Avinash Raj Jan 23 '15 at 16:09
  • [Duplicate](https://stackoverflow.com/questions/5937241/regular-expression-to-match-unescaped-special-characters-only). Also, you didn't tell us the language you're using, and you definitely didn't search for at least 20 seconds to find a solution. –  Jan 23 '15 at 16:11
  • @AvinashRaj in my case Swift (or more specifically I'm using `NSRegularExpression`), it doesn't seem to be missing any regexp features so I didn't think it needed mentioning. @rac you definitely didn't look for more than 20 seconds at the duplicate you posted, as it doesn't cover my exact case; i.e - its escape sequence can't be used to escape itself. – Haravikk Jan 24 '15 at 13:29
  • 1
    @rac It's not duplicate because the other post doesn't ask for this nor has the answer that solves the problem. – Tomáš Zato May 12 '16 at 12:50

2 Answers2

13

I've found the following solution which works for NSRegularExpression but also works in every regexp implementation I've tried that supports negative look-behinds:

(?<!\\)(?:(\\\\)*)[*]

In this case the second unmatched parenthesis matches any pairs of back-slashes, effectively eliminating them, at which point the negative look-behind can compare any remaining (odd numbered) back-slashes as expected.

Haravikk
  • 3,109
  • 1
  • 33
  • 46
2

A lookbehind can not solve this problem. The only way is to match escaped characters first to avoid them and to find unescaped characters:

you can isolate the unescaped character from the result with a capture group:

(?:\\.)+|(\*)

or with the \K (pcre/perl/ruby) feature that removes all on the left from the result:

(?:\\.)*\K\*

or using backtracking control verbs (pcre/perl) to skip escaped characters:

(?:\\.)+(*SKIP)(*FAIL)|\*

The only case you can use a lookbehind is with the .net framework that allows unlimited length lookbehind:

(?<!(?:[^\\]|\A)(?:\\\\)*\\)\*

or in a more limited way with java:

(?<!(?:[^\\]|\A)(?:\\\\){0,1000}\\)\*
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • 1
    Hmm, actually I just tried `(?<!\\)(?:\\\\*)[*]` and this seems to work as expected by effectively eliminating pairs of back-slashes, which can only leave single back-slashes to precede the search pattern (or nothing or a literal). If there's no reason why this isn't valid, I may add it as my own answer. – Haravikk Jan 24 '15 at 13:26