create all possible combinations with multiple variants from list

Question

Ok so the problem is as follows:

let's say I have a list like this [12R,102A,102L,250L] what I would want is a list of all possible combinations, however for only one combination/number. so for the example above, the output I would like is:

[12R,102A,250L]
[12R,102L,250L]

my actual problem is a lot more complex with many more sites. Thanks for your help

edit: after reading some comments I guess this is slightly unclear. I have 3 unique numbers here, [12, 102, and 250] and for some numbers, I have different variations, for example [102A, 102L]. what I need is a way to combine the different positions[12,102,250] and all possible variations within. just like the lists, I presented above. they are the only valid solutions. [12R] is not. neither is [12R,102A,102L,250L]. so far I have done this with nested loops, but I have a LOT of variation within these numbers, so I can't really do that anymore

ill edit this again: ok so it seems as though there is still some confusion so I might extend the point I made before. what I am dealing with there is DNA. 12R means the 12th position in the sequence was changed to an R. so the solution [12R,102A,250L] means that the amino acid on position 12 is R, 102 is A 250 is L.

this is why a solution like [102L, 102R, 250L] is not usable, because the same position can not be occupied by 2 different amino acids.

thank you

Is it possible to have e.g. `[102A,12R,102L,250L]`, or it's given, that same numbers are always next to each other? — Grzegorz Skibinski, May 31 '20 at 13:55
no, valid outputs are combinations at each point, like the lists I presented. — david, May 31 '20 at 14:05

Laurent B. · Answer 1 · 2020-05-31T15:55:53.107

0

So it works with ["10A","100B","12C","100R"] (case 1) and ['12R','102A','102L','250L'] (case 2)

import itertools as it

liste = ['12R','102A','102L','250L']

comb = []
for e in it.combinations(range(4), 3):
    e1 = liste[e[0]][:-1]
    e2 = liste[e[1]][:-1]
    e3 = liste[e[2]][:-1]
    if e1 != e2 and e2 != e3 and e3 != e1:
        comb.append([e1+liste[e[0]][-1], e2+liste[e[1]][-1], e3+liste[e[2]][-1]])
print(list(comb))
# case 1 : [['10A', '100B', '12C'], ['10A', '12C', '100R']]
# case 2 : [['12R', '102A', '250L'], ['12R', '102L', '250L']]

edited May 31 '20 at 15:55

answered May 31 '20 at 14:00

Laurent B.

1,653
1
7
16

can you guys please read the question before commenting. this is not what i need, is the example output i presented not clear? – david May 31 '20 at 14:02
I think "however for only one combination/number" is unclear. – Nico Müller May 31 '20 at 14:02
I think so I believed you want all the possible combinations – Laurent B. May 31 '20 at 14:03
the output i gave is the output i would want. so at each unique number i get all possible combinations. i guess i a way to solve this is with nested loops but i have hundreds of different variations, so that doesn't work – david May 31 '20 at 14:06
is it what you are looking for before I optimize the code ? – Laurent B. May 31 '20 at 14:08
1

['102A', '102L', '250L'], again no. it cant be the same position (102) multiple times. the only valid solutions are the 2 lists i posted in my original question. ['12R', '102A', '102L'] is wrong also – david May 31 '20 at 14:10

score 0 · Answer 2 · edited Jun 04 '20 at 10:03

0

Try this:

from itertools import groupby
import re

def __genComb(arr, res=[]):
    for i in range(len(res), len(arr)):
        el=arr[i]
        if(len(el[1])==1):
            res+=el[1]
        else:
            for el_2 in el[1]:
                yield from __genComb(arr, res+[el_2])
            break
    if(len(res)==len(arr)): yield res

def genComb(arr):
    res=[(k, list(v)) for k,v in groupby(sorted(arr), key=lambda x: re.match(r"(\d*)", x).group(1))]
    yield from __genComb(res)

Sample output (using the input you provided):

test=["12R","102A","102L","250L"]

for el in genComb(test):
    print(el)

# returns:

['102A', '12R', '250L']
['102L', '12R', '250L']

edited Jun 04 '20 at 10:03

halfer

19,824
17
99
186

answered May 31 '20 at 14:16

Grzegorz Skibinski

12,624
2
11
34

hello, most of these are not valid, sorry. the only acceptable solutions are the 2 lists i posted above. i tried to edit the OP to make the question more clear – david May 31 '20 at 14:20
See now - I added one element to your input – Grzegorz Skibinski May 31 '20 at 14:40
(it mimics your desired output, when we take the input, which you provided) – Grzegorz Skibinski May 31 '20 at 14:42
Ok, tweaked my answer - so you have same input, as in your question ;) – Grzegorz Skibinski May 31 '20 at 14:44
hey, when i use a different list, for example test = ["10A","100B","12C","100R"] i get back ['12R', '100B', '12C', '100R']. it also doesn't work when i try it with my actual list, which is a lot larger – david May 31 '20 at 15:01
Ok - that's what I was asking about in the comment to the main question - order doesn't matter for the output then ? – Grzegorz Skibinski May 31 '20 at 15:05
I.e. `["10A","100B","12C","100R"]` should produce same as `["10A","100B","100R","12C"]` ? – Grzegorz Skibinski May 31 '20 at 15:06

score 0 · Answer 3 · answered May 31 '20 at 15:41

You can use a recursive generator function:

from itertools import groupby as gb
import re

def combos(d, c = []):
  if not d:
     yield c
  else:
     for a, b in d[0]:
       yield from combos(d[1:], c + [a+b]) 

d = ['12R', '102A', '102L', '250L']
vals = [re.findall('^\d+|\w+$', i) for i in d]
new_d = [list(b) for _, b in gb(sorted(vals, key=lambda x:x[0]), key=lambda x:x[0])]
print(list(combos(new_d)))

Output:

[['102A', '12R', '250L'], ['102L', '12R', '250L']]

score 0 · Answer 4 · answered May 31 '20 at 16:39

import re

def get_grouped_options(input):
     options = {}
     for option in input:
          m = re.match('([\d]+)([A-Z])$', option)
          if m:
               position = int(m.group(1))
               acid = m.group(2)
          else:
               continue
          if position not in options:
               options[position] = []
          options[position].append(acid)
     return options


def yield_all_combos(options):
     n = len(options)
     positions = list(options.keys())
     indices = [0] * n
     while True:
          yield ["{}{}".format(position, options[position][indices[i]])
                 for i, position in enumerate(positions)]
          j = 0
          indices[j] += 1
          while indices[j] == len(options[positions[j]]):
               # carry
               indices[j] = 0
               j += 1
               if j == n:
                    # overflow
                    return
               indices[j] += 1


input = ['12R', '102A', '102L', '250L']

options = get_grouped_options(input)

for combo in yield_all_combos(options):
     print("[{}]".format(",".join(combo)))

Gives:

[12R,102A,250L]
[12R,102L,250L]

ti7 · Answer 5 · 2020-10-12T19:12:20.007

I believe this is what you're looking for!

This works by

generating a collection of all the postfixes each prefix can have
finding the total count of positions (multiply the length of each sublist together)
rotating through each postfix by basing the read index off of both its member postfix position in the collection and the absolute result index (known location in final results)

import collections
import functools
import operator
import re

# initial input
starting_values = ["12R","102A","102L","250L"]

d = collections.defaultdict(list)  # use a set if duplicates are possible
for value in starting_values:
    numeric, postfix = re.match(r"(\d+)(.*)", value).groups()
    d[numeric].append(postfix)  # .* matches ""; consider (postfix or "_") to give value a size

# d is now a dictionary of lists where each key is the prefix
# and each value is a list of possible postfixes


# each set of postfixes multiplies the total combinations by its length
total_combinations = functools.reduce(
    operator.mul,
    (len(sublist) for sublist in d.values())
)

results = collections.defaultdict(list)
for results_pos in range(total_combinations):
    for index, (prefix, postfix_set) in enumerate(d.items()):
        results[results_pos].append(
            "{}{}".format(  # recombine the values
                prefix,     # numeric prefix
                postfix_set[(results_pos + index) % len(postfix_set)]
            ))

# results is now a dictionary mapping { result index: unique list }

displaying

# set width of column by longest prefix string
# need a collection for intermediate cols, but beyond scope of Q
col_width = max(len(str(k)) for k in results)
for k, v in results.items():
    print("{:<{w}}: {}".format(k, v, w=col_width))


0: ['12R', '102L', '250L']
1: ['12R', '102A', '250L']

with a more advanced input

["12R","102A","102L","250L","1234","1234A","1234C"]

0: ['12R', '102L', '250L', '1234']
1: ['12R', '102A', '250L', '1234A']
2: ['12R', '102L', '250L', '1234C']
3: ['12R', '102A', '250L', '1234']
4: ['12R', '102L', '250L', '1234A']
5: ['12R', '102A', '250L', '1234C']

You can confirm the values are indeed unique with a set

final = set(",".join(x) for x in results.values())
for f in final:
    print(f)

12R,102L,250L,1234
12R,102A,250L,1234A
12R,102L,250L,1234C
12R,102A,250L,1234
12R,102L,250L,1234A
12R,102A,250L,1234C

notes

in cPython, regexes are cached after their first compile
list member multiplier from "How can I multiply all items in a list together with Python?"

create all possible combinations with multiple variants from list

5 Answers5