Finding combinations that meet thresholds for subgroups

Question

I need to find all the combinations of rows where multiple conditions are met. I tried to use the powerset recipe from itertools and the answer here by adding multiple conditions but can't seem to get the conditions to work properly. The code I've come up with is:

def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

df_groups = pd.concat(
                  [data.reindex(l).assign(Group = n) for n, l in 
                   enumerate(powerset(data.index)) ])
                   if ((data.loc[l, 'Account'] == 'COS').any() & (data.loc[l,'Amount'].sum() >= 100)
                        & (data.loc[l,'Account'] == 'Rev').any() & (data.loc[l, 'Amount'].sum() >= 150)
                        & (data.loc[l,'Account'] == 'Inv').any() and (data.loc[l, 'Amount'].sum() >= 60)))] )

What I'm trying to do above is find only those combinations where the the following thresholds are met/exceeded:

Account     Amount
COS         150 
Rev         100 
Inv         60

Sample data:

Entity  Account Amount  Location
A10      Rev    60       A
B01      Rev    90       B
C11      Rev    80       C
B01      COS    90       B
C11      COS    80       C
A10      Inv    60       A

Apologies in advance for the poor question writing etiquette, its the first time I haven't been able to find an answer on Stackoverflow and have had to ask a question.
Also, aware that this will get very slow as len(data) increases so any suggestions on that end are also greatly appreciated.

Gonçalo Peres · Answer 1 · 2022-09-13T22:09:06.867

Let's start by creating the dataframe that OP mentions in the question

df = pd.DataFrame({'Entity': ['A10', 'B01', 'C11', 'B01', 'C11', 'A10'],
                     'Account': ['Rev', 'Rev', 'Rev', 'COS', 'COS', 'Inv'],
                        'Amount': [60, 90, 80, 90, 80, 60],
                        'Location': ['A', 'B', 'C', 'B', 'C', 'A']})

[Out]:
  Entity Account  Amount Location
0    A10     Rev      60        A
1    B01     Rev      90        B
2    C11     Rev      80        C
3    B01     COS      90        B
4    C11     COS      80        C
5    A10     Inv      60        A

Then, in order to achieve OP's goal of filtering based on specific constraints, one can do this using a one-liner with pandas.concat and pandas.DataFrame.query, as follows

df_new = pd.concat([df[df['Account'] == 'Rev'].query('Amount <= 100'), df[df['Account'] == 'COS'].query('Amount <= 150'),  df[df['Account'] == 'Inv'].query('Amount <= 60')])

[Out]:
  Entity Account  Amount Location
0    A10     Rev      60        A
1    B01     Rev      90        B
2    C11     Rev      80        C
3    B01     COS      90        B
4    C11     COS      80        C
5    A10     Inv      60        A

As the sample dataframe doesn't allow us to get a clear picture if it is working or not, let us create a new random dataframe for testing purposes.

import numpy as np

df = pd.DataFrame({'Entity': np.random.choice(['A10', 'B01', 'C11', 'B01', 'C11', 'A10'], 1000),
                        'Account': np.random.choice(['Rev', 'COS', 'Inv'], 1000),
                        'Amount': np.random.randint(0, 1000, 1000),
                        'Location': np.random.choice(['A', 'B', 'C'], 1000)})

[Out]:
  Entity Account  Amount Location
0    B01     Rev     497        A
1    B01     Rev      52        C
2    B01     Rev      42        C
3    B01     Rev     285        B
4    A10     COS     714        B
5    A10     Rev     288        B
6    B01     Rev     396        B
7    A10     Inv     277        B
8    C11     Inv     435        C
9    C11     COS     228        C

If one runs the one-liner on that newly created dataframe, one gets the following

df_new = pd.concat([df[df['Account'] == 'Rev'].query('Amount <= 100'), df[df['Account'] == 'COS'].query('Amount <= 150'),  df[df['Account'] == 'Inv'].query('Amount <= 60')])


[Out]:
    Entity Account  Amount Location
1      B01     Rev      52        C
2      B01     Rev      42        C
21     B01     Rev       1        A
31     C11     Rev      38        A
47     A10     Rev      83        C
60     B01     Rev      41        C
156    B01     Rev      81        C
197    C11     Rev      61        C
206    C11     Rev      90        A
224    C11     Rev      23        B

which, from the sample we are seeing, it does satisfy the requirements.

There are additional ways to solve this.

Another example is using pandas.DataFrame.apply and a lambda function as follows

df_new = df[df.apply(lambda x: x['Amount'] <= 100 if x['Account'] == 'Rev' else x['Amount'] <= 150 if x['Account'] == 'COS' else x['Amount'] <= 60, axis=1)]

Finding combinations that meet thresholds for subgroups

1 Answers1