Suppose I have a dataframe like this
player teammates
0 A [C,F]
1 C [A,F]
2 B [B]
3 D [H,J,K]
4 H [J,K]
5 Q [D]
Now rows 3, 4 and 5 represent some challenging data points. If the teammates column contained the entire team for each player, the problem would be trivial.
The expected output would be a list of all teams, so like:
[[A,C,F], [B], [D,H,J,K,Q]]
The first step could be to just consolidate both columns into one via
df.apply(lambda row: list(set([row['player']]+row['teammates'])), axis=1)
, like so
0 [A,C,F]
1 [A,C,F]
2 [B]
3 [D,H,J,K]
4 [H,J,K]
5 [Q,D]
but checking pairwise for common elements and further consolidating seems very inefficient. Is there an efficient way to get the desired output?