0

I'm trying to build a .NET regex to match SQL Server constant strings... but not Unicode strings.

Here's a bit of SQL:

select * from SomeTable where SomeKey = 'abc''def' and AnotherField = n'another''value'

Note that within a string two single quotes escapes a single quote.

The regex should match 'abc''def' but not n'another''value'.

I have a regex now that manages to locate a string, but it also matches the Unicode string (starting just after the N):

'('{2})*([^']*)('{2})*([^']*)('{2})*'

Thanks!

user505765
  • 529
  • 1
  • 8
  • 15

1 Answers1

1

This pattern will do most of what you are looking to do:

(?<unicode>n)?'(?<value>(?:''|[^'])*)'

The upside is that it should accurately match any number of escaped quotes. (SomeKey = 'abc''''def''' will match abc''''def''.)

The downside is it also matches Unicode strings, although it captures the leading n to identify it as a Unicode string. When you process the regular expression, you can ignore matches where the match group "unicode" was successful.

The pattern creates the following groups for each match:

unicode: Success if the string is a Unicode string, fails to match if ASCII
value: the string value.  escaped single quotes remain escaped

If you are using .NET regular expressions, you could add (?(unicode)(?<-value>)) to the end of the pattern to suppress matching the value, although the pattern as a whole would still match.

Edit

Having thought about it some more, the following pattern should do exactly what you wanted; it will not match Unicode strings at all. The above approach might still be more readable, however.

(?:n'(?:''|[^'])*'[^']*)*(?<!n)'(?<value>(?:''|[^'])*)'
drf
  • 8,461
  • 32
  • 50
  • I decided to use your (?n)?'(?(?:''|[^'])*)' and a MatchEvaluator to handle the replace I was doing. Note that your second regex matches both ansi strings and unicode strings. Thanks! – user505765 Aug 19 '11 at 09:59