1

I am using the exrex package to generate a list of all permutations for a regex. But, I have several regexes and want to create a set of all permutations (without duplicates). So, given:

from exrex import generate

my_regexs=('a|b','a|c')
expansions=map(generate,my_regexs)

Perhaps I don't even need map or the intermediate variable expansions for this - not sure. Now, how do I get a sorted list from these:

# Create a set from all of the expansions (e.g., let's store in myset, for clarity)
#     in order to merge duplicates
myset=... # Results in myset containing {'a','c','b'} - hash order
sorted_list=sorted(myset) # Finally, we get ['a','b','c']

Thanks for any help with this and I bet there is a simple one-liner with a list comprehension that can do this.

Note: We are dealing with a map object containing multiple generators (i.e., a sequenced container of multiple generators, not a list of lists!)

Update: I thought I made the inputs and outputs clear:

Input: ('a|b','a|c') # Two reg-exs, results in all-permutations: ['a','b','a','c']
Output: ['a','b','c'] # Eliminating duplicates, we get the output presented
Michael Goldshteyn
  • 71,784
  • 24
  • 131
  • 181
  • Can you give an example input and output? The question is not clear to me at the moment. – Jared Goguen May 25 '18 at 18:30
  • Possible duplicate of [Making a flat list out of list of lists in Python](https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python) – Jared Goguen May 25 '18 at 18:42
  • `unique = set(perm for subproduct in expansions for perm in subproduct)` – Jared Goguen May 25 '18 at 18:43
  • In regards to your edit re generators: the important thing is that generators and lists are both iterables, so any solution that works on an iterables will work on both lists and generators. – Jared Goguen May 25 '18 at 18:47
  • @Jared, well, then I am going about this wrong, because I cannot combine the multiple generators successfully (from the map operation) to come up with a single merged list. The uniq/sort is trivial. I.e., I cannot get from the result of map to a single set of strings. Try the code as presented and see if you can get the output from the input using `exrex.generate` - it seems to be non-trivial. – Michael Goldshteyn May 25 '18 at 18:49
  • Did you try the line I wrote above? It should work fine with your `expansions = map(generated, regexes)`. – Jared Goguen May 25 '18 at 18:52

2 Answers2

1
from exrex import generate

rgxs = (r'a|b', r'a|c')
expansions = sorted(set(e for r in rgxs for e in generate(r)))

print(expansions)   # [u'a', u'b', u'c']
FMc
  • 41,963
  • 13
  • 79
  • 132
  • Maybe change `expansions` to `subset` and `my_regexes` to `expansions`? – Jared Goguen May 25 '18 at 18:40
  • @FMc, that does not produce the correct result. It produces: `['a', 'b', 'c', '|']`, the problem is that a (slightly modified) subset of your answer is incorrect: `list(e for expansions in my_regexs for e in expansions)` results in: `['a', '|', 'b', 'a', '|', 'c']`, which is incorrect. – Michael Goldshteyn May 25 '18 at 18:40
  • @Jared, not sure I follow, because the terms are used multiple times. If you'd like to post a correct answer that eliminates `expansions` altogether, you have my vote. – Michael Goldshteyn May 25 '18 at 18:42
  • @MichaelGoldshteyn See my comment on the question itself – Jared Goguen May 25 '18 at 18:43
  • @MichaelGoldshteyn Fixed (I had forgotten to use `generate()`). – FMc May 25 '18 at 19:11
  • @FMc, OK, that works and avoids `map`! Now, same question to you that I asked Jared: Must `e` and/or `r` appear twice in the construction of the `set`? – Michael Goldshteyn May 25 '18 at 19:14
  • @MichaelGoldshteyn I guess I'm not following what you mean by "appear twice". A Python comprehension is roughly a shorthand for-loop that returns a collection. In nearly every for-loop, a variable necessarily appears twice: once in the loop declaration (eg `for x in xs`), and at least once where you use it (eg `print(x)`). – FMc May 25 '18 at 19:28
1

The other answer covers the nested comprehension case, so I am updating this answer to use itertools.chain.from_iterable.

from exrex import generate
from itertools import chain
flatten = chain.from_iterable

regexes = ('a|b', 'a|c')

ordered_unique = sorted(set(flatten(map(generate, regexes))))
Jared Goguen
  • 8,772
  • 2
  • 18
  • 36
  • Well, it works, but do I understand it? No! Merging `expansions` and `unique`, we get: `ordered=sorted(set(perm for subproduct in map(generate, regexes) for perm in subproduct))` Is it possible not to reuse `perm` and/or `subproduct` here, or must they both appear twice? – Michael Goldshteyn May 25 '18 at 19:02
  • Ahh, I was reading the two for generators from right to left for some reason instead of left to right. Now, it makes sense, thanks. I did try `chan.from_iterable`, unsuccessfully when I attempted this myself. – Michael Goldshteyn May 25 '18 at 19:25
  • @MichaelGoldshteyn Now, you might want to go look at the top two answers of the duplicate I link to see why I marked it as a duplicate . – Jared Goguen May 25 '18 at 19:27