0

I have VB.NET program that applies user supplied PATTERN and REPLACEMENT arguments to a collection of input strings using RegEx.Replace and special characters in the REPLACEMENT argument are not interpreted.

Is there a way to make RegEx.Replace interpret special characters in the REPLACEMENT string like it does in the PATTERN string? For example, treat "\t" as a tab and "\xAE" or "\u00AE" as (R)?

In Linux, I get the correct output from sed

echo Test XXX Replacement | sed 's/XXX/\xAE/'

gives "Test ® Replacement"

But in VB it just gives me the special character pattern as a literal

Regex.Replace("Test XXX Replacement", "XXX", "\t")
Regex.Replace("Test XXX Replacement", "XXX", "\u00AE")

gives "Test \t Replacement" and "Test \u00AE Replacement" respectively

I've found 2 somewhat related but distinctly not applicable posts, my problem differs from Escape Regex.replace() replacement string in VB.net in that I actually want the special characters in my replacement strings.

It also differs from Regex VB.Net Regex.Replace, that question had control of the replacement string and dodged my issue by using a VB constant instead of a RegEx special character.

Are there any settings/options/utilities/methods that can make my (user supplied!) RegEx REPLACEMENT strings correctly handle special characters?

Robert Sheahan
  • 2,100
  • 1
  • 10
  • 12

2 Answers2

2

Is there a way to make RegEx.Replace interpret special characters in the REPLACEMENT string like it does in the PATTERN string? For example, treat "\t" as a tab and "\xAE" or "\u00AE" as (R)?

You mean like the Regex.Unescape(String) Method?

If you can accept the limitations declared in the Remarks Section:

  • It reverses the transformation performed by the Escape method by removing the escape character ("\") from each character escaped by the method. These include the \, *, +, ?, |, {, [, (,), ^, $, ., #, and white space characters. In addition, the Unescape method unescapes the closing bracket (]) and closing brace (}) characters.
  • It replaces the hexadecimal values in verbatim string literals with the actual printable characters. For example, it replaces @"\x07" with "\a", or @"\x0A" with "\n". It converts to supported escape characters such as \a, \b, \e, \n, \r, \f, \t, \v, and alphanumeric characters.

Regex.Unescape("\xAE\t\u00AE") yields the string result of "®" & vbTab & "®"

TnTinMn
  • 11,522
  • 3
  • 18
  • 39
1

VB.Net doesn't have escape characters.

According to the docs for the Replace method:

Substitutions are the only regular expression language elements that are recognized in a replacement pattern. All other regular expression language elements, including character escapes, are allowed in regular expression patterns only and are not recognized in replacement patterns.

The equivalent to your two lines of code would be:

Regex.Replace("Test XXX Replacement", "XXX", vbTab)
Regex.Replace("Test XXX Replacement", "XXX", ChrW(&H00AE))

You could also use string interpolation with the replacement string if you needed to embed a hex string or character in a longer replacement string:

Regex.Replace("Test XXX Replacement", "XXX", $"{vbTab} yyy {ChrW(&H00AE)}")

Be sure to import the Microsoft.VisualBasic namespace, if not already imported.

Chris Dunaway
  • 10,974
  • 4
  • 36
  • 48
  • They are user supplied replacement strings (and patterns too), the examples I provided are simplified for reproduce-ability. The actual patterns will be much longer and contain an unknown number of special characters in arbitrary order mixed with other text and match replacements, so it's not that simple :-( – Robert Sheahan Jul 16 '19 at 16:07
  • Thanks for the edits, it looks like some form of string interpolation is my only choice. Are there any other modules that provide or expose that, or am I going to have to reproduce the regexp pattern special character handling myself? I'm going to upvote but not mark as answered in the hopes that somebody can point me to an existing solution that can handle the arbitrary codes in the unicode substitutions, but this is definitely the right, if painful, path, thank you. – Robert Sheahan Jul 16 '19 at 16:40