0

I have a file with a list of names in strings like:

(John|Mary|Bob)(Anderson|Brooks|Cook)

I'm trying to use regular expressions to pull the data out in strings like:

John Anderson John Brooks John Cook Mary Anderson Mary Brooks Mary Cook Bob Anderson Bob Brooks Bob Cook

I'm fairly new at RegEx so any help would be aprreciated. Thanks

3 Answers3

4

That's not something you can do with a regex. Regex engines match text, they can't do a cartesian product on them. Of course you can use a regex to get started. Let's see - in Python, I'd do

>>> import itertools
>>> import re
>>> s  = "(John|Mary|Bob)(Anderson|Brooks|Cook)"
>>> names = [name.split("|") for name in re.findall(r"\(([^()]*)\)", s)]
>>> names
[['John', 'Mary', 'Bob'], ['Anderson', 'Brooks', 'Cook']]
>>> [" ".join(item) for item in itertools.product(*names)]
['John Anderson', 'John Brooks', 'John Cook', 'Mary Anderson', 'Mary Brooks', 
 'Mary Cook', 'Bob Anderson', 'Bob Brooks', 'Bob Cook']
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
0

It looks like your source file is in regular expression form already, so your problem is basically just generating strings matching that regular expression.

Look at this question for some suggestions: Reversing a regular expression in Python

Community
  • 1
  • 1
couchand
  • 2,639
  • 1
  • 21
  • 27
0

A regex alone cannot quite accomplish this. In general a regex engine is only going to match one specific place in your input (such as the first possible match)--or maybe none at all--so you need a loop to iterate until all the input is consumed or the pattern no longer matches.

The loop can be either explicit (like a while(true){}) or implicit as with Tim's example. You didn't say what language and/or tools you are using so it's difficult to be specific--regex support varies. In Tim's example, looping is implicitly provided by the split() and findall() method. Perl's split() provides an implicit loop, too.

Andy70109
  • 121
  • 2