2

Ok so the problem is as follows:

let's say I have a list like this [12R,102A,102L,250L] what I would want is a list of all possible combinations, however for only one combination/number. so for the example above, the output I would like is:

[12R,102A,250L]
[12R,102L,250L]

my actual problem is a lot more complex with many more sites. Thanks for your help

edit: after reading some comments I guess this is slightly unclear. I have 3 unique numbers here, [12, 102, and 250] and for some numbers, I have different variations, for example [102A, 102L]. what I need is a way to combine the different positions[12,102,250] and all possible variations within. just like the lists, I presented above. they are the only valid solutions. [12R] is not. neither is [12R,102A,102L,250L]. so far I have done this with nested loops, but I have a LOT of variation within these numbers, so I can't really do that anymore

ill edit this again: ok so it seems as though there is still some confusion so I might extend the point I made before. what I am dealing with there is DNA. 12R means the 12th position in the sequence was changed to an R. so the solution [12R,102A,250L] means that the amino acid on position 12 is R, 102 is A 250 is L.

this is why a solution like [102L, 102R, 250L] is not usable, because the same position can not be occupied by 2 different amino acids.

thank you

david
  • 97
  • 4

5 Answers5

0

So it works with ["10A","100B","12C","100R"] (case 1) and ['12R','102A','102L','250L'] (case 2)

import itertools as it

liste = ['12R','102A','102L','250L']

comb = []
for e in it.combinations(range(4), 3):
    e1 = liste[e[0]][:-1]
    e2 = liste[e[1]][:-1]
    e3 = liste[e[2]][:-1]
    if e1 != e2 and e2 != e3 and e3 != e1:
        comb.append([e1+liste[e[0]][-1], e2+liste[e[1]][-1], e3+liste[e[2]][-1]])
print(list(comb))
# case 1 : [['10A', '100B', '12C'], ['10A', '12C', '100R']]
# case 2 : [['12R', '102A', '250L'], ['12R', '102L', '250L']]

Laurent B.
  • 1,653
  • 1
  • 7
  • 16
  • can you guys please read the question before commenting. this is not what i need, is the example output i presented not clear? – david May 31 '20 at 14:02
  • I think "however for only one combination/number" is unclear. – Nico Müller May 31 '20 at 14:02
  • I think so I believed you want all the possible combinations – Laurent B. May 31 '20 at 14:03
  • the output i gave is the output i would want. so at each unique number i get all possible combinations. i guess i a way to solve this is with nested loops but i have hundreds of different variations, so that doesn't work – david May 31 '20 at 14:06
  • is it what you are looking for before I optimize the code ? – Laurent B. May 31 '20 at 14:08
  • 1
    ['102A', '102L', '250L'], again no. it cant be the same position (102) multiple times. the only valid solutions are the 2 lists i posted in my original question. ['12R', '102A', '102L'] is wrong also – david May 31 '20 at 14:10
0

Try this:

from itertools import groupby
import re

def __genComb(arr, res=[]):
    for i in range(len(res), len(arr)):
        el=arr[i]
        if(len(el[1])==1):
            res+=el[1]
        else:
            for el_2 in el[1]:
                yield from __genComb(arr, res+[el_2])
            break
    if(len(res)==len(arr)): yield res

def genComb(arr):
    res=[(k, list(v)) for k,v in groupby(sorted(arr), key=lambda x: re.match(r"(\d*)", x).group(1))]
    yield from __genComb(res)

Sample output (using the input you provided):

test=["12R","102A","102L","250L"]

for el in genComb(test):
    print(el)

# returns:

['102A', '12R', '250L']
['102L', '12R', '250L']
halfer
  • 19,824
  • 17
  • 99
  • 186
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
0

You can use a recursive generator function:

from itertools import groupby as gb
import re

def combos(d, c = []):
  if not d:
     yield c
  else:
     for a, b in d[0]:
       yield from combos(d[1:], c + [a+b]) 

d = ['12R', '102A', '102L', '250L']
vals = [re.findall('^\d+|\w+$', i) for i in d]
new_d = [list(b) for _, b in gb(sorted(vals, key=lambda x:x[0]), key=lambda x:x[0])]
print(list(combos(new_d)))

Output:

[['102A', '12R', '250L'], ['102L', '12R', '250L']]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
0
import re

def get_grouped_options(input):
     options = {}
     for option in input:
          m = re.match('([\d]+)([A-Z])$', option)
          if m:
               position = int(m.group(1))
               acid = m.group(2)
          else:
               continue
          if position not in options:
               options[position] = []
          options[position].append(acid)
     return options


def yield_all_combos(options):
     n = len(options)
     positions = list(options.keys())
     indices = [0] * n
     while True:
          yield ["{}{}".format(position, options[position][indices[i]])
                 for i, position in enumerate(positions)]
          j = 0
          indices[j] += 1
          while indices[j] == len(options[positions[j]]):
               # carry
               indices[j] = 0
               j += 1
               if j == n:
                    # overflow
                    return
               indices[j] += 1


input = ['12R', '102A', '102L', '250L']

options = get_grouped_options(input)

for combo in yield_all_combos(options):
     print("[{}]".format(",".join(combo)))

Gives:

[12R,102A,250L]
[12R,102L,250L]
alani
  • 12,573
  • 2
  • 13
  • 23
0

I believe this is what you're looking for!

This works by

  • generating a collection of all the postfixes each prefix can have
  • finding the total count of positions (multiply the length of each sublist together)
  • rotating through each postfix by basing the read index off of both its member postfix position in the collection and the absolute result index (known location in final results)
import collections
import functools
import operator
import re

# initial input
starting_values = ["12R","102A","102L","250L"]

d = collections.defaultdict(list)  # use a set if duplicates are possible
for value in starting_values:
    numeric, postfix = re.match(r"(\d+)(.*)", value).groups()
    d[numeric].append(postfix)  # .* matches ""; consider (postfix or "_") to give value a size

# d is now a dictionary of lists where each key is the prefix
# and each value is a list of possible postfixes


# each set of postfixes multiplies the total combinations by its length
total_combinations = functools.reduce(
    operator.mul,
    (len(sublist) for sublist in d.values())
)

results = collections.defaultdict(list)
for results_pos in range(total_combinations):
    for index, (prefix, postfix_set) in enumerate(d.items()):
        results[results_pos].append(
            "{}{}".format(  # recombine the values
                prefix,     # numeric prefix
                postfix_set[(results_pos + index) % len(postfix_set)]
            ))

# results is now a dictionary mapping { result index: unique list }

displaying

# set width of column by longest prefix string
# need a collection for intermediate cols, but beyond scope of Q
col_width = max(len(str(k)) for k in results)
for k, v in results.items():
    print("{:<{w}}: {}".format(k, v, w=col_width))


0: ['12R', '102L', '250L']
1: ['12R', '102A', '250L']

with a more advanced input

["12R","102A","102L","250L","1234","1234A","1234C"]

0: ['12R', '102L', '250L', '1234']
1: ['12R', '102A', '250L', '1234A']
2: ['12R', '102L', '250L', '1234C']
3: ['12R', '102A', '250L', '1234']
4: ['12R', '102L', '250L', '1234A']
5: ['12R', '102A', '250L', '1234C']

You can confirm the values are indeed unique with a set

final = set(",".join(x) for x in results.values())
for f in final:
    print(f)

12R,102L,250L,1234
12R,102A,250L,1234A
12R,102L,250L,1234C
12R,102A,250L,1234
12R,102L,250L,1234A
12R,102A,250L,1234C

notes

ti7
  • 16,375
  • 6
  • 40
  • 68