Select One of each in a pandas grouping

Question

I'm trying to create all of the possible combinations of pairings of players to assign into 4 person golf teams based on handicap type A, B, C, or D.

I've tried various itertools methods such as combinations and permutations but can't figure out the right approach.

from itertools import combinations, product, permutations
g = player_df.groupby(by = 'hcp_ABCD')
teams_listoflists = [group[1].index for group in g]
teams_combo_ndx = [player for player in permutations(teams_listoflists, 4)]

Here is my pandas table:

        handicap      name hcp_ABCD
0         24   Player1        D
1         21   Player2        D
2          8   Player3        B
3         14   Player4        C
4         20   Player5        D
5         13   Player6        C
6         -1   Player7        A
7          5   Player8        A
8          8   Player9        B
9          6  Player10        B
10        20  Player11        D
11        15  Player12        C
12         0  Player13        A
13        12  Player14        C
14         0  Player15        A
15        10  Player16        B

i would like the output to be all combinations (without duplicates) of player combinations (teams) such that each team has a type A, B, C, and D on each. This output can be a similar table as above grouped by "options."

Edit: Am adding this output example for clarity.

                       A Player     B Player     C Player   D Player
    option 1  team1    Player7      Player3      Player4    Player1
              team2    Player8      Player9      Player6    Player2
              team3    Player13     Player10     Player12   Player5
              team4    Player15     Player16     Player14   Player11

    option 2  team1    Player7      Player16     Player4    Player1
              team2    Player8      Player3      Player6    Player2
              team3    Player13     Player9      Player12   Player5
              team4    Player15     Player10     Player14   Player11

    ...


                       A Player     B Player     C Player   D Player
    option n  team1    Player7      Player3      Player4    Player11
              team2    Player8      Player9      Player6    Player1
              team3    Player13     Player10     Player12   Player2
              team4    Player15     Player16     Player14   Player5

The point of the above is that I'm trying to find a generator that cycles through all combinations of player in each handicap group so that the combination of options of teams is clear.

Edit #2 I've determined that this code produces a combination of all of the potential team combinations:

g = df.groupby(by = 'hcp_ABCD')
combinations = [list(group[1].index) for group in g]

This creates a list of lists with the A Players in list[0], B Players in list[1], etc.

And this gets an indexer for all possible combinations of teams:

from itertools import product
options = [option for option in product(*combinations)]

But, how to assign these out into the "options" (see above example) and ensure no duplication is what I'm stuck on.

Edit #3 A simpler version (way to think about this problems) is to use the following sets:

A = ['A1', 'A2', 'A3', 'A4']
B = ['B1', 'B2', 'B3', 'B4']
C = ['C1', 'C2', 'C3', 'C4']
D=  ['D1', 'D2', 'D3', 'D4']

This essentially does what the groupby does above (grouping by hcp_ABCD) but names each "A Player", "B Player", etc.

possible_combinations of teams:

team_combinations = [team for team in product(A, B, C, D)]

then the next trick is to assign these onto combinations of 4 teams with no duplication of players.

Am now thinking that itertools.product(g) may be the key to this. this creates an exhaustive list of ABCD teams. But, it doesn't assign them into options without player duplicates like the above. — leeprevost, Apr 16 '19 at 20:18
@PMende, no it isn't same. I'm not trying to sample. I'm trying to get exhuaustive list. The question you point to is getting 30% sampling. — leeprevost, Apr 16 '19 at 20:20
Ah. I see what you mean. In this case, break each of the handicap types into sub dataframes (by handicap type), then do a cross join of each of those tables onto each other (you'll have to add a dummy column of the same value in each to do this). — PMende, Apr 16 '19 at 20:28
@Pmende, -- thinking about your idea here. Maybe iterate through itertools.product(*g), add the dummy column for "option x" and then cross join the tables? Then, I can do a multi-index from there? — leeprevost, Apr 16 '19 at 20:38

jose_bacoy · Accepted Answer · 2019-04-16T20:16:10.983

Thanks for clarifying about the expected result. Here is the my answer which I tested. It may not be the exact format of your expected result but I leave it to you to fix it.

import pandas as pd
def is_duplicate_team(team, group):
    '''check if an option already exists'''
    return any(group == t for t in team)
def is_player_exists(group, arr):
    '''check if a player exists in a group'''
    return any(x in g for g in group for x in arr)

df = [         (24   ,'Player1','D'),
         (21   ,'Player2','D'),
          (8   ,'Player3','B'),
         (14   ,'Player4','C'),
         (20   ,'Player5','D'),
         (13   ,'Player6','C'),
         (-1   ,'Player7','A'),
          (5   ,'Player8','A'),
          (8   ,'Player9','B'),
          (6  ,'Player10','B'),
        (20  ,'Player11','D'),
        (15  ,'Player12','C'),
         (0  ,'Player13','A'),
        (12  ,'Player14','C'),
         (0  ,'Player15','A'),
        (10  ,'Player16','B')]
df = pd.DataFrame(df, columns=['handicap', 'name', 'hcp_ABCD'])
from itertools import product
grouped = df.groupby('hcp_ABCD')['name'].apply(list).reset_index()
df_name = [n for n in grouped.name]
df_comb = [p for p in product(*df_name)]

# below code will get all combinations of groups and for a team having all players
teams=[]
for i in df_comb[:-1]:
    group=[i] 
    for j in df_comb[1:]: 
        if not is_player_exists(group, j):
            group.append(j)
        if len(group) == 4:
            if not is_duplicate_team(teams, group):
                teams.append(group)
            continue

# below code will print the output similar to what you expected
i=0
for t in teams:
    i+=1
    print('option: ', str(i) )
    for p in t:
        print(p)

I think this does include an exhaustive option for all team configurations. But it doesn't assign the teams into 4 teams such that each combination of team options is shown. Am going to update my question to be more clear on this point. — leeprevost, Apr 16 '19 at 19:19
Well dang, you are right. This does produce a result. Was hoping to get this into a multi-index dataframe but I agree, this does seem to do it. — leeprevost, Apr 16 '19 at 21:08
I do however, think that @Pmende is getting to maybe a slightly more elegant (ie. less code) way to do this. — leeprevost, Apr 16 '19 at 21:09
I had to think about this for a while using a simplified approach using 4 sets of groups to select from A = (A1, A2, A3, A4), B = (B1, B2, B3, B4) ... D = (D1, D2, D3, D4). I used the [p for p in product(A, B, C, D)] to get the team combos and then run it through your combinations for loop. I wonder if there is a simpler way using itertools.combinations([A,B,C,D],4) but it does seem to work. am checking this as correct answer. — leeprevost, Apr 17 '19 at 14:44

PMende · Answer 2 · 2019-04-16T21:52:53.133

I made a suggestion in the comments. Here is an implementation:

import pandas as pd
from functools import reduce

data = [
    (24,'Player1','D'),
    (21,'Player2','D'),
    (8,'Player3','B'),
    (8,'Player4','B'),
    (14,'Player5','C'),
    (13,'Player6','C'),
    (-1,'Player7','A'),
    (5,'Player8','A')
]
df = pd.DataFrame(
    data,
    columns=['handicap', 'name', 'hcp_ABCD']
)

dfs = [
    grp_df.drop(columns="hcp_ABCD")
          .rename(columns={"name": f"player_{hndcp}",
                           "handicap": f"handicap_{hndcp}"})
    for hndcp, grp_df in df.assign(key=1)
                           .groupby("hcp_ABCD")
]
result = reduce(
    lambda left, right: left.merge(right, how="outer", on="key"),
    dfs
).drop(columns="key")
print(result)

Output:

    handicap_A player_A  handicap_B player_B  handicap_C player_C  handicap_D player_D
0           -1  Player7           8  Player3          14  Player5          24  Player1
1           -1  Player7           8  Player3          14  Player5          21  Player2
2           -1  Player7           8  Player3          13  Player6          24  Player1
3           -1  Player7           8  Player3          13  Player6          21  Player2
4           -1  Player7           8  Player4          14  Player5          24  Player1
5           -1  Player7           8  Player4          14  Player5          21  Player2
6           -1  Player7           8  Player4          13  Player6          24  Player1
7           -1  Player7           8  Player4          13  Player6          21  Player2
8            5  Player8           8  Player3          14  Player5          24  Player1
9            5  Player8           8  Player3          14  Player5          21  Player2
10           5  Player8           8  Player3          13  Player6          24  Player1
11           5  Player8           8  Player3          13  Player6          21  Player2
12           5  Player8           8  Player4          14  Player5          24  Player1
13           5  Player8           8  Player4          14  Player5          21  Player2
14           5  Player8           8  Player4          13  Player6          24  Player1
15           5  Player8           8  Player4          13  Player6          21  Player2

Thanks for clarifying @PMende. But, if I get this right, your dfs dataframe includes 4 dataframes in it of 4 players. It would seem my options list above example would be as long as 4! (factorial). If each team has 4 players and each team has 1 from each handicap group, you would have a potential for a lot more combinations that what your dfs shows. am not sure total combos but somthing like 4 option for B player, 4 options for C player, and 4 options of D player for each of A player. — leeprevost, Apr 16 '19 at 21:27
@leeprevost I only showed the first 5 rows, because there are 256 of them. — PMende, Apr 16 '19 at 21:47
@leeprevost I've edited the sample data and shown a full result of 16 combinations for each of the handicap types having 2 players. — PMende, Apr 16 '19 at 21:54
I've checked this and it works. But, I'm not sure I could play back exactly HOW it works. I think I like the anonymous better as I can follow it. but this does work. — leeprevost, Apr 18 '19 at 19:16

Rich Andrews · Answer 3 · 2019-04-17T15:04:53.327

The following approach is uses a cartesian product and then groups twice to distribute the players into teams with a set of unique handicaps.

import pandas as pd
from pandas.compat import StringIO

print(pd.__version__)
pd.options.display.max_rows = 664

csvdata = StringIO("""handicap,name,hcp_ABCD
24,Player1,D
21,Player2,D
8,Player3,B
14,Player4,C
20,Player5,D
13,Player6,C
-1,Player7,A
5,Player8,A
8,Player9,B
6,Player10,B
20,Player11,D
15,Player12,C
0,Player13,A
12,Player14,C
0,Player15,A
10,Player16,B""")

df=pd.read_csv(csvdata)

# Generate all possible groups
# https://stackoverflow.com/questions/53699012/performant-cartesian-product-cross-join-with-pandas
def cartesian_product(left, right):
    return (left.assign(key=1).merge(right.assign(key=1), on='key').drop('key', 1))

def distribute_players(x):
    x['distribute'] = range(0, 4)
    return x

df = cartesian_product(df, df.copy())
df = df.groupby(['name_x', 'hcp_ABCD_y']).apply(distribute_players)
df['team'] = df.groupby(['name_x', 'distribute']).ngroup()
print(df[['handicap_y','name_y','hcp_ABCD_y','team']].sort_values(['team']))

     handicap_y    name_y hcp_ABCD_y  team
0            24   Player1          D     0
2             8   Player3          B     0
3            14   Player4          C     0
6            -1   Player7          A     0
1            21   Player2          D     1
5            13   Player6          C     1
7             5   Player8          A     1
8             8   Player9          B     1

Am not looking to sample. Looking for an exhaustive combination of options of team configurations such that each team is 'balanced' meaning 1 A, 1B, 1C, and 1D player on each. — leeprevost, Apr 16 '19 at 19:16
Updated to reflect producing an exhaustive combination of balanced teams, using a concise approach including accurate output format. — Rich Andrews, Apr 17 '19 at 15:07
this definitely works. I really like the two groupby statements in their simplicity and efficiency. What is throwing me a bit is the cartesian product. Wouldn't it be simpler with just another groupby? 1) g = sample_df.groupby('hcp_ABCD'), 2) combo_teams = {group[0]: list(group[1].index) for group in g}, 3) cp = [option for option in product(*combo_teams.values())]. — leeprevost, Apr 18 '19 at 19:48

Select One of each in a pandas grouping

3 Answers3