-1

I am trying to find specific combinations of items in a list. The list is made up of x groups that repeat y times. In this example x and y = 3, but in practice could be much bigger in size. I want to find every combination of groups and y, but without duplicating an x value for a given combination. I think it’s easier to just show an example of what I want.

Here is an example.

A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335', 'ST2_0.245', 'ST2_0.29', 'ST2_0.335', 'ST3_0.245', 'ST3_0.29', 'ST3_0.335']

So three groups, ST1, ST2, and ST3 – each having 3 iterations, 0.245, 0.290, and 0.335.

I want to find the following combinations.

('ST1_0.245', 'ST2_0.245', 'ST3_0.245')
('ST1_0.245', 'ST2_0.245', 'ST3_0.29')
('ST1_0.245', 'ST2_0.245', 'ST3_0.335')
('ST1_0.245', 'ST2_0.29', 'ST3_0.245')
('ST1_0.245', 'ST2_0.29', 'ST3_0.29')
('ST1_0.245', 'ST2_0.29', 'ST3_0.335')
('ST1_0.245', 'ST2_0.335', 'ST3_0.245')
('ST1_0.245', 'ST2_0.335', 'ST3_0.29')
('ST1_0.245', 'ST2_0.335', 'ST3_0.335')
('ST1_0.29', 'ST2_0.245', 'ST3_0.245')
('ST1_0.29', 'ST2_0.245', 'ST3_0.29')
('ST1_0.29', 'ST2_0.245', 'ST3_0.335')
('ST1_0.29', 'ST2_0.29', 'ST3_0.245')
('ST1_0.29', 'ST2_0.29', 'ST3_0.29')
('ST1_0.29', 'ST2_0.29', 'ST3_0.335')
('ST1_0.29', 'ST2_0.335', 'ST3_0.245')
('ST1_0.29', 'ST2_0.335', 'ST3_0.29')
('ST1_0.29', 'ST2_0.335', 'ST3_0.335')
('ST1_0.335', 'ST2_0.245', 'ST3_0.245')
('ST1_0.335', 'ST2_0.245', 'ST3_0.29')
('ST1_0.335', 'ST2_0.245', 'ST3_0.335')
('ST1_0.335', 'ST2_0.29', 'ST3_0.245')
('ST1_0.335', 'ST2_0.29', 'ST3_0.29')
('ST1_0.335', 'ST2_0.29', 'ST3_0.335')
('ST1_0.335', 'ST2_0.335', 'ST3_0.245')
('ST1_0.335', 'ST2_0.335', 'ST3_0.29')
('ST1_0.335', 'ST2_0.335', 'ST3_0.335')

Note that ST1, ST2, and ST3 are only in each combination once.

Here is code that I got to work at least for small cases.

import itertools
import numpy as np

comb = []
gr_list=['ST1','ST2','ST3']
for itr in itertools.combinations(A, len(gr_list)):
    # pdb.set_trace()
    for n in np.arange(len(gr_list)):
        if sum(itr[n].split('_')[0] in s for s in itr) > 1:
            break
    
    if n == len(gr_list)-1:
        comb.append(itr)

This works for a few small examples I tested, but when I tried larger values, I was getting more results than I thought, but that could be a mistake in my attempts to calculate how many are expected. But either way, it takes far too long. Is there a faster way to do this?

I do have both the values separately to begin with. Which as I write this I am feeling like thats a better way to approach it, but I am not sure how to do that either.

casducks
  • 13
  • 2
  • "I was getting more results than I thought" - in your example, you have 3 choices for the first value times 3 choices for the second times 3 choices for the last one, that is 3x3x3 = 27. So yes, the size of the output will quickly become very large. What do you intend to do with such an output? – Thierry Lathuille Nov 04 '21 at 18:39

2 Answers2

0

You can use itertools.product for this, which will produce an iterator rather than a list (which will generally be more efficient if you're iterating through rather than producing the whole collection). You're going to end up with the product of the lengths of the different categories as the number of elements in the iterator.

asthasr
  • 9,125
  • 1
  • 29
  • 43
0

Create the groups as required and then use itertools.product on the groups:

A = ['ST1_0.245', 'ST1_0.29', 'ST1_0.335', 
     'ST2_0.245', 'ST2_0.29', 'ST2_0.335', 
     'ST3_0.245', 'ST3_0.29', 'ST3_0.335']

prefixes = set(s.split("_")[0] for s in A)
groups = [[a for a in A if a.split("_")[0]==p] for p in prefixes]

>>> list(itertools.product(*groups))

[('ST2_0.245', 'ST3_0.245', 'ST1_0.245'),
 ('ST2_0.245', 'ST3_0.245', 'ST1_0.29'),
 ('ST2_0.245', 'ST3_0.245', 'ST1_0.335'),
 ('ST2_0.245', 'ST3_0.29', 'ST1_0.245'),
 ('ST2_0.245', 'ST3_0.29', 'ST1_0.29'),
 ('ST2_0.245', 'ST3_0.29', 'ST1_0.335'),
 ('ST2_0.245', 'ST3_0.335', 'ST1_0.245'),
 ('ST2_0.245', 'ST3_0.335', 'ST1_0.29'),
 ('ST2_0.245', 'ST3_0.335', 'ST1_0.335'),
 ('ST2_0.29', 'ST3_0.245', 'ST1_0.245'),
 ('ST2_0.29', 'ST3_0.245', 'ST1_0.29'),
 ('ST2_0.29', 'ST3_0.245', 'ST1_0.335'),
 ('ST2_0.29', 'ST3_0.29', 'ST1_0.245'),
 ('ST2_0.29', 'ST3_0.29', 'ST1_0.29'),
 ('ST2_0.29', 'ST3_0.29', 'ST1_0.335'),
 ('ST2_0.29', 'ST3_0.335', 'ST1_0.245'),
 ('ST2_0.29', 'ST3_0.335', 'ST1_0.29'),
 ('ST2_0.29', 'ST3_0.335', 'ST1_0.335'),
 ('ST2_0.335', 'ST3_0.245', 'ST1_0.245'),
 ('ST2_0.335', 'ST3_0.245', 'ST1_0.29'),
 ('ST2_0.335', 'ST3_0.245', 'ST1_0.335'),
 ('ST2_0.335', 'ST3_0.29', 'ST1_0.245'),
 ('ST2_0.335', 'ST3_0.29', 'ST1_0.29'),
 ('ST2_0.335', 'ST3_0.29', 'ST1_0.335'),
 ('ST2_0.335', 'ST3_0.335', 'ST1_0.245'),
 ('ST2_0.335', 'ST3_0.335', 'ST1_0.29'),
 ('ST2_0.335', 'ST3_0.335', 'ST1_0.335')]
not_speshal
  • 22,093
  • 2
  • 15
  • 30
  • Thank you! I was getting closer after I posted this but I don't think I would have got it without help. Can you explain what the *groups is doing? – casducks Nov 04 '21 at 18:58
  • Lookup [iterable unpacking](https://www.python.org/dev/peps/pep-3132/). There are many questions on SO that address this. [Here's](https://stackoverflow.com/questions/50950690/python-unpacking-operator) one. – not_speshal Nov 04 '21 at 19:00