Python regex string expansion

Question

Suppose I have the following string:

trend  = '(A|B|C)_STRING'

I want to expand this to:

A_STRING
B_STRING
C_STRING

The OR condition can be anywhere in the string. i.e STRING_(A|B)_STRING_(C|D)

would expand to

STRING_A_STRING_C
STRING_B_STRING C
STRING_A_STRING_D
STRING_B_STRING_D

I also want to cover the case of an empty conditional:

(|A_)STRING would expand to:

A_STRING
STRING

Here's what I've tried so far:

def expandOr(trend):
    parenBegin = trend.index('(') + 1
    parenEnd = trend.index(')')
    orExpression = trend[parenBegin:parenEnd]
    originalTrend = trend[0:parenBegin - 1]
    expandedOrList = []

    for oe in orExpression.split("|"):
        expandedOrList.append(originalTrend + oe)

But this is obviously not working.

Is there any easy way to do this using regex?

You realize you're discarding everything after the closing parenthesis, right? Do you not see a way to fix that? — jwodder, Nov 19 '13 at 01:17
Not sure what you mean. The code works for the case where the parentheses come at the end the of the string. i.e. `STRING_(A|B)` — Mark Kennedy, Nov 19 '13 at 01:33
Right, the code works there because there's nothing after the parentheses to discard, but if you input `FOO_(A|B)_BAR`, you get `FOO_A` and `FOO_B`, with the `_BAR` being discarded. Do you not realize that this is what's wrong with your code? Do you not see how you forgot to handle the substring after the `)`? — jwodder, Nov 19 '13 at 01:38
More answers to this question here: http://stackoverflow.com/questions/492716/reversing-a-regular-expression-in-python — PaulMcG, Nov 19 '13 at 02:52

score 6 · Accepted Answer · answered Nov 19 '13 at 02:30

6

Here's a pretty clean way. You'll have fun figuring out how it works :-)

def expander(s):
    import re
    from itertools import product
    pat = r"\(([^)]*)\)"
    pieces = re.split(pat, s)
    pieces = [piece.split("|") for piece in pieces]
    for p in product(*pieces):
        yield "".join(p)

Then:

for s in ('(A|B|C)_STRING',
          '(|A_)STRING',
          'STRING_(A|B)_STRING_(C|D)'):
    print s, "->"
    for t in expander(s):
        print "   ", t

displays:

(A|B|C)_STRING ->
    A_STRING
    B_STRING
    C_STRING
(|A_)STRING ->
    STRING
    A_STRING
STRING_(A|B)_STRING_(C|D) ->
    STRING_A_STRING_C
    STRING_A_STRING_D
    STRING_B_STRING_C
    STRING_B_STRING_D

answered Nov 19 '13 at 02:30

Tim Peters

67,464
13
126
132

Try `print " ".join(expander('(A|B|C)_STR|ING'))` to find the error in the code. – user1346466 Mar 11 '16 at 17:00
The code I posted implicitly assumes that parentheses and vertical bars are metacharacters, used only to express the alternation patterns the OP was interested in. That leads to the simple code shown. If you want to make other assumptions, that's fine, but then you should spell them out in a new answer of your own. To me, they would complicate the code in ways that merely obscure the real points. – Tim Peters Mar 11 '16 at 17:43

score 4 · Answer 2 · answered Nov 19 '13 at 02:27

4

import exrex
trend  = '(A|B|C)_STRING'
trend2 = 'STRING_(A|B)_STRING_(C|D)'

>>> list(exrex.generate(trend))
[u'A_STRING', u'B_STRING', u'C_STRING']

>>> list(exrex.generate(trend2))
[u'STRING_A_STRING_C', u'STRING_A_STRING_D', u'STRING_B_STRING_C', u'STRING_B_STRING_D']

answered Nov 19 '13 at 02:27

Seçkin Savaşçı

3,446
2
23
39

Thanks for taking the time to write this out. This requires an external module – Mark Kennedy Nov 19 '13 at 17:43

score 2 · Answer 3 · answered Nov 19 '13 at 02:07

I would do this to extract the groups:

def extract_groups(trend):
    l_parens = [i for i,c in enumerate(trend) if c == '(']
    r_parens = [i for i,c in enumerate(trend) if c == ')']
    assert len(l_parens) == len(r_parens)
    return [trend[l+1:r].split('|') for l,r in zip(l_parens,r_parens)]

And then you can evaluate the product of those extracted groups using itertools.product:

expr = 'STRING_(A|B)_STRING_(C|D)'
from itertools import product
list(product(*extract_groups(expr)))
Out[92]: [('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D')]

Now it's just a question of splicing those back onto your original expression. I'll use re for that :)

#python3.3+
def _gen(it):
    yield from it

p = re.compile('\(.*?\)')

for tup in product(*extract_groups(trend)):
    gen = _gen(tup)
    print(p.sub(lambda x: next(gen),trend))

STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D

There's probably a more readable way to get re.sub to sequentially substitute things from an iterable, but this is what came off the top of my head.

Thanks for taking the time to write this out – Mark Kennedy Nov 19 '13 at 17:42 — Mark Kennedy, Nov 19 '13 at 17:42

score 2 · Answer 4 · answered May 08 '21 at 20:33

It is easy to achieve with sre_yield module:

>>> import sre_yield
>>> trend  = '(A|B|C)_STRING'
>>> strings = list(sre_yield.AllStrings(trend))
>>> print(strings)
['A_STRING', 'B_STRING', 'C_STRING']

The goal of sre_yield is to efficiently generate all values that can match a given regular expression, or count possible matches efficiently... It does this by walking the tree as constructed by sre_parse (same thing used internally by the re module), and constructing chained/repeating iterators as appropriate. There may be duplicate results, depending on your input string though -- these are cases that sre_parse did not optimize.

Python regex string expansion

4 Answers4

Linked

Related