How to use RegEx for these cases?

Question

Given a string, I need to identify the field after a $ that may or may not be surrounded by { }:

$verb = verb
${verb}age = verb

$$ acts as an escape and I need to account for that as well as it may precede the delimiting $.

What I have so far is:

reg = r'\$([_a-zA-Z0-9]*)'
s = '$who likes $what'
re.findall(reg, s)
['who', 'what']

But I cannot devise the expression for the optional bracing, I tried:

reg = r'\$({?[_a-zA-Z0-9]*}?)'

But that picks up values such as:

${who
$who}

What would be the appropriate expression to be able to account for the optional bracing?

Update:

When it comes to preceding $, the following would be invalid strings:

$$verb = invalid
$${verb} = invalid

But these would be valid:

$$$verb = $verb
$$${verb} = $verb

This is because a $$ is replaced with a single $ afterwards.

The fourth bird · Accepted Answer · 2020-10-27T12:23:45.800

2

If the opening { should match up with the closing } you could use 2 capturing groups with an alternation and then the value will be either in group 1 or group 2.

If the $ should not be preceded by another $ you could use a negative lookbehind (?<!\$)\$ asserting not a dollar sign directly at the left.

\$(?:{([_a-zA-Z0-9]+)}|([_a-zA-Z0-9]+))\b

Regex demo

Or to get the values only, you could use an alternation with lookarounds

(?<=\$)[_a-zA-Z0-9]+\b|(?<=\${)[_a-zA-Z0-9]+(?=})

Regex demo

import re

regex = r"(?<=\$)[_a-zA-Z0-9]+\b|(?<=\${)[_a-zA-Z0-9]+(?=})"
test_str = ("$verb = verb\n"
            "${verb}age = verb")

print(re.findall(regex, test_str))

Output

['verb', 'verb']

EDIT

For the updated question, for example using capturing groups, you can match either a single or 3 or more dollar signs asserting that what precedes is not a dollar sign.

(?<!\$)(?:\$(?:\${2,})?)(?:{([_a-zA-Z0-9]+)}|([_a-zA-Z0-9]+))

Regex demo

edited Oct 27 '20 at 12:23

answered Oct 26 '20 at 23:29

The fourth bird

154,723
16
55
70

So the first one is ideal, however for the case where there are preceding `$`, please see my update. – Oct 27 '20 at 12:15
I posted this comment and then updated the post, did not anticipate you being immediately responsive! Thanks! – Oct 27 '20 at 12:17
@pasta_sauce Like this? `(?<!\$)(?:\$(?:\$\$)?)(?:{([_a-zA-Z0-9]+)}|([_a-zA-Z0-9]+))` https://regex101.com/r/lUZjvS/1 – The fourth bird Oct 27 '20 at 12:19
@pasta_sauce I have added an update to the answer using a capturing groups example. – The fourth bird Oct 27 '20 at 12:24
Yes. Because of my lacking RegEx knowledge, is this "flattenable"? It currently returns (in Python) `[(Group1, Group2), (Group1, Group2), ...]`. I assume this is a result from the "capture groups"? Does the other implementation you proposed (alternation with lookaround) solve this? – Oct 27 '20 at 12:29
@pasta_sauce that is correct. There are ways to flatten the tuples and remove the empty entries. https://stackoverflow.com/questions/10632839/transform-list-of-tuples-into-a-flat-list-or-a-matrix – The fourth bird Oct 27 '20 at 12:33
@pasta_sauce There is a longer pattern with lookarounds but it is not that efficient https://regex101.com/r/FtIrok/1 – The fourth bird Oct 27 '20 at 12:52

score -1 · Answer 2 · answered Oct 26 '20 at 23:28

-1

You can get the second set of matches with something like:

reg2 = '\$(?:{)([_a-zA-Z0-9]+)(?:})'

Which makes the bracing mandatory but not captured...

answered Oct 26 '20 at 23:28

Mike Guelfi

132
4

How to use RegEx for these cases?

2 Answers2