Find all combinations of values in a dataframe column and optimize based on values in a different column

Question

I have a dataframe with the following structure:

|code|score|slots|
|a   |50   |0.25 |
|b   |66   |0.33 |
|c   |20   |0.20 |
|d   |28   |0.20 |
|e   |23   |0.33 |

What I need to do is to find all the combinations of 'slots' that add up to no more than a given value, say 1.0, then to select the combination with the highest associated combined 'score', and finally output the 'codes' corresponding to the optimal combination. Thanks to this answer I was able to do the first part, but the association with the original dataframe is lost and I do not know how to retrieve the associated information from the 'score' and 'code' columns:

import itertools
comb = [seq for i in range(len(df['slots']), 0, -1) for seq in itertools.combinations(df['slots'], i)
          if sum(seq) <= 1]

I was thinking about using dictionaries, but my dataframe will contain identical keys. How can I get a list of keys and values corresponding to the selected combinations?

score 0 · Answer 1 · answered Jan 07 '21 at 13:40

I've partially solved this problem by first producing unique combinations of 'codes':

data = [['a', 50, 0.25], ['b', 66, 0.33], ['c', 20, 0.2], ['d', 28, 0.2], ['e', 23, 0.33]]
df = pd.DataFrame(data, columns=['code', 'points', 'slots'])

combinations = [seq for i in range(len(df['code']), 0, -1)
               for seq in itertools.combinations(df['code'], i)]

dfComb = pd.DataFrame(columns=['combination', 'slots', 'points'])
for combination in combinations:
    dfA = df[df['code'].isin(combination)]
    dfB = pd.DataFrame({'combination': [combination],
                        'slots': sum(dfA['slots']),
                        'points': sum(dfA['points'])})
    dfComb = dfComb.append(dfB, ignore_index=True)
dfTrim = dfComb[dfComb['slots'] < 1.0]

However, since the complexity of this problem grows exponentially with the size of the dataframe, I will probably try to use numpy instead of pandas and see if that speeds thing up at least a bit.

Find all combinations of values in a dataframe column and optimize based on values in a different column

1 Answers1