Capture a regex match for replacement from lookup table

Question

I'm writing a language interpreter in PowerShell (the language is PILOT, for those who might be interested), and I've gotten to the point where I'm implementing variable replacement. A variable name consists of either a $ or a #, followed by up to ten characters in the set [A-Za-z0-9]. However, if the variable name is prefixed by a \, it should not be replaced. As near as I can figure, the pattern I'm looking to match is [^\\][\$#]\w{,10}, but I'm not clear on how to store the result of the match in a PowerShell variable so that I can look up the variable name in a table to replace it with its value.

For example, if the powershell variable $expr contains the string \#Foo has the value #Foo, and $vartable["#Foo"] contains the value 5, I would need to capture #Foo - the second one only - in $varname, and then do a replace of the captured #Foo with $vartable[$varname] - $expr -replace "[^\\][\$#]\w{,10}",$vartable[$varname] should yield \#Foo has the value 5.

Have I correctly calculated the pattern, and how do I capture the match?

(I should note that I'm developing this with PowerShell 5.1, but expect it to be able to run in that version or anything later, including PSCore on non-Windows OSes.)

`[^\\]` is not a valid way to check if a char is escaped or not, since the backslash may be also escaped. I.e. `\#Foo` has a var that must be matched. Is it so here? Then, the valid way is to use `(?<=(?<!\\)(?:\\{2})*)`. — Wiktor Stribiżew, Dec 13 '18 at 19:35
Then, you cannot use code inside a string replacement pattern, use `[regex]::Replace($s,'(?<=(?<!\\)(?:\\{2})*)[$#]\w{1,10}\b', {param($match) $vartable[$match.Value] })` — Wiktor Stribiżew, Dec 13 '18 at 19:39
@WiktorStribiżew - No, if the string contains `\#FOO`, I _don't_ want it replaced; the backslash more-or-less signals that I want the variable name as a literal to remain in the string - for example, `\#FOO is #FOO` should, if #FOO is in the variable table with the value 5, end up after processing as `\#FOO is 5`. — Jeff Zeitlin, Dec 13 '18 at 19:44
Then use `'(?<!\\)[$#]\w{1,10}\b'`. BTW, I meant `\\$Foo`, sorry for not adding the second backslash, I meant that if you escape something with ``\`` the literal backslashes are usually escaped with the second backslash. — Wiktor Stribiżew, Dec 13 '18 at 19:44
@WiktorStribiżew - Do I not have to escape the `$`? I thought that was the EOL anchor? — Jeff Zeitlin, Dec 13 '18 at 20:11
If you use it inside a single quoted literal you do not need to escape it inside a character class for sure. In a double quoted literal, you might want to escape `$` when it can be parsed as a start of a variable. — Wiktor Stribiżew, Dec 13 '18 at 20:18
In PowerShell Core v6.1.0+ and newer, I'd recommend `$s -replace '(?<=(?<!\\)(?:\\{2})*)[$#]\w{1,10}\b', { $vartable[$_.Value] }`. If it works for you please let know. — Wiktor Stribiżew, Dec 13 '18 at 20:36

score 1 · Accepted Answer · edited Dec 14 '18 at 00:33

1

The [^\\] pattern is usually not a valid way to check if a char is escaped or not since the backslash may be also escaped to denote a literal backslash char. For example, \\#Foo has an unescaped variable that must be matched as per the original requirements. The valid way is to use (?<=(?<!\\)(?:\\{2})*) .NET compliant lookbehind that matches a location that is immediately preceded with any amount of even backslashes not preceded with a backslash.

Next, you cannot use code inside a string replacement pattern. You may either use a callback inside [regex]::Replace or - starting with PowerShell Core v6.1 - you may use a script block as the replacement argument for -replace:

[regex]::Replace($s,'(?<=(?<!\\)(?:\\{2})*)[$#]\w{1,10}\b', {param($match) $vartable[$match.Value] })

or (PowerShell Core v6.1+):

$s -replace '(?<=(?<!\\)(?:\\{2})*)[$#]\w{1,10}\b', { $vartable[$_.Value] }

edited Dec 14 '18 at 00:33

mklement0

382,024
64
607
775

answered Dec 13 '18 at 23:18

Wiktor Stribiżew

607,720
39
448
563

See the edit to the question; I'm doing this on 5.1, and expect it to run on that or later. – Jeff Zeitlin Dec 14 '18 at 12:51
@JeffZeitlin Use `[regex]::Replace($s,'(?<=(?<!\\)(?:\\{2})*)[$#]\w{1,10}\b', {param($match) $vartable[$match.Value] })` – Wiktor Stribiżew Dec 14 '18 at 13:06
In a test string of `\\\$foo $foo \#foo #foo`, this seems to delete all occurrences of `$foo`, and replace the 'unescaped' `#foo` with the value. It does the same regardless of the order of the `$foo`s and `#foo`s. – Jeff Zeitlin Dec 14 '18 at 14:13
Wait... It seems to make a difference if `$s` is defined with `'` or `"`. – Jeff Zeitlin Dec 14 '18 at 14:16
@JeffZeitlin In a `\\\$foo $foo \#foo #foo` *string*, the second and fourth words are matched. See [demo](http://regexstorm.net/tester?p=%28%3f%3c%3d%28%3f%3c!%5c%5c%29%28%3f%3a%5c%5c%7b2%7d%29*%29%5b%24%23%5d%5cw%7b1%2c10%7d%5cb&i=%5c%5c%5c%24foo+%24foo+%5c%23foo+%23foo&r=). You must use single quoted literals to avoid string interpolation. – Wiktor Stribiżew Dec 14 '18 at 14:16
The `[RegEx]::Replace...` technique works for me, with a slight modification due to an issue not related to the regex _per se_, only with the way I have to format the value in `$vartable[]`. Thank you! – Jeff Zeitlin Dec 14 '18 at 18:48

Capture a regex match for replacement from lookup table

1 Answers1

Linked