14

Is there a complete list of regex escape sequences somewhere? I found this, but it was missing \\ and \e for starters. Thus far I have come up with this regex pattern that hopefully matches all the escape sequences:

 @"\\([bBdDfnreasStvwWnAZG\\]|x[A-Z0-9]{2}|u[A-Z0-9]{4}|\d{1,3}|k<\w+>)"
mpen
  • 272,448
  • 266
  • 850
  • 1,236

3 Answers3

26

Alternatively, if you only want to escape a string correctly, you could just depend on Regex.Escape() which will do the necessary escaping for you.

Hint: There is also a Regex.Unescape()

VVS
  • 19,405
  • 5
  • 46
  • 65
  • Actually, I'm trying to unescape it. – mpen Nov 25 '10 at 10:24
  • Didn't think there was an `Unescape` for some reason.... nevertheless, it won't unescape `\w` and `\k` which I need to do also. However, this will considerably ease escaping everything else... thank you! God... I wish I knew about that 8 hours ago >. – mpen Nov 25 '10 at 10:48
  • @Mark: Next time try "regex unescape" in your favorite search engine :-) – VVS Nov 25 '10 at 10:54
  • Yeah, yeah >.< Just didn't think it was possible because of things like `\w`... didn't realize they'd give me partial support. – mpen Nov 25 '10 at 11:07
7

This MSDN page (Regular Expression Language Elements) is a good starting place, with this subpage specifically about escape sequences.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 1
    Ah....finally, a complete reference. However, it says "\ when followed by a character that is not recognized as an escaped character in this and other tables in this topic, matches that character.", but when I try to put `\y` in a regex, it says that's an unrecognized escape sequence. Why's that? – mpen Nov 25 '10 at 10:32
  • 1
    Backslash is a c# string escape sequence character. Example: "\n" is string with only a newline character. However, in regex backslash also begins escape sequences. The unrecognised escape sequence is from "\y" not being a c# escape sequence. Using "\\y" will ensure that no escape sequences is attempted when you initialise the string. – Gusdor Jan 07 '11 at 11:00
5

Don't forget the zillions of possible unicode categories: \p{Lu}, \P{Sm} etc.

There are too many of these for you to match individually, but I suppose you could use something along the lines of \\[pP]\{[A-Za-z0-9 \-_]+?\} (untested).

And there's also the simpler stuff that's missing from your list: \., \+, \*, \? etc etc.

If you're simply trying to unescape an existing regex then you could try Regex.Unescape. It's not perfect, but it's probably better than anything you or I could knock up in a short space of time.

LukeH
  • 263,068
  • 57
  • 365
  • 409
  • Yuck... not looking forward to handling this case. (Thanks) – mpen Nov 25 '10 at 10:29
  • Unfortunately not, those will cause "Unrecognized escape sequence" compile errors, and instead needs to be wrapped by a positive character group ex [.][+][*][?] – nickl- Nov 28 '22 at 21:50