-2

Given a "dictionary" compromised of entries that are valid regex, e.g.:

CARS?|(AUTO|BIG)?TRUCK|VEHICLE|(CRUISE|CONTAINER)? SHIP|AUTOMOTIVE

Within python, how could I go about separating every entry or "dictionary value" onto its own line? I can't simply split by |, because if you look at one entry value as itself, (AUTO|BIG)?TRUCK, that would break the value because it contains the same character.

I am not trying to just match these characters, I am also additionnally trying to replace them.

X33
  • 1,310
  • 16
  • 37
  • 2
    What have you tried so far (i.e. post your code) and where is your current attempt breaking? – Tony Tuttle Apr 27 '18 at 22:43
  • 1
    Possible duplicate of [regex, extract string NOT between two brackets](https://stackoverflow.com/questions/19414193/regex-extract-string-not-between-two-brackets) – user3483203 Apr 27 '18 at 23:55
  • Just replace the `{ and `} in that question with `\( and \)` and it will work for you. – user3483203 Apr 27 '18 at 23:55

1 Answers1

1

You have at least two possibilities here, one using the newer (*SKIP)(*FAIL) mechanism, the other using a function (that replaces the | in question first):

import regex as re

expressions = r'''CARS?|(AUTO|BIG)?TRUCK|VEHICLE|(CRUISE|CONTAINER)? SHIP|AUTOMOTIVE'''

# first alternative using (*SKIP)(*FAIL)
rx = re.compile(r'\([^()]*\)(*SKIP)(*FAIL)|\|')
parts = "\n".join(rx.split(expressions))
print(parts)

# second, a function 
rx = re.compile(r'\([^()]*\)|(\|)')

def replacer(match):
    if match.group(1):
        return 'SUPERMAN'
    else:
        return match.group(0)

expressions = rx.sub(replacer, expressions)
parts = "\n".join(expressions.split('SUPERMAN'))
print(parts)

Both will yield

CARS?
(AUTO|BIG)?TRUCK
VEHICLE
(CRUISE|CONTAINER)? SHIP
AUTOMOTIVE
Jan
  • 42,290
  • 8
  • 54
  • 79