0

I have a set of experiment days and subjects (anonymised subset below) in a dataframe. How do I generate all the pairwise comparisons per day in a new dataframe where subjects alse play the role of experimenter?

Input:

Day Subject
Monday Alpha
Monday Bravo
Monday Charlie
Wednesday Delta
Wednesday Echo
Wednesday Foxtrot
Wednesday Golf
Wednesday Hotel

Expected Output:

Day Subject Experimenter
Monday Alpha Bravo
Monday Alpha Charlie
Monday Bravo Charlie
Wednesday Delta Echo
Wednesday Delta Foxtrot
Wednesday Delta Golf
Wednesday Delta Hotel
Wednesday Echo Foxtrot
Wednesday Echo Golf
Wednesday Echo Hotel
Wednesday Foxtrot Golf
Wednesday Foxtrot Hotel
Wednesday Golf Hotel

So far, I an only able to generate the total set of combinations but not by day!

import numpy as np
import pandas as pd
import itertools as it

df = pd.DataFrame({'Day': ['Monday', 'Monday', 'Monday', 'Wednesday', 'Wednesday', 'Wednesday', 'Wednesday', 'Wednesday'],
                    'Subject': ['Alpha', 'Bravo', 'Charlie', 'Delta', 'Echo', 'Foxtrot', 'Golf', 'Hotel']})

pair_order_list = it.combinations(df['Subject'], 2)
pairs = list(pair_order_list)

Actual Output

[('Alpha', 'Bravo'), ('Alpha', 'Charlie'), ('Alpha', 'Delta'),...]

Any advice would be welcome?

matekus
  • 778
  • 3
  • 14

1 Answers1

0

The following code appears to generate the expected output:

from  itertools import combinations

# https://stackoverflow.com/questions/72811105/how-can-i-search-for-sub-groups-of-dataframe-that-contains-specific-pairs-of-dat.
L = [(i, tuple(y)) for i, x in df.groupby('Day')['Subject'] for y in combinations(x, 2)]
df_2 = pd.DataFrame(L, columns=['Day','SubjExp'])

# https://stackoverflow.com/questions/29550414/how-can-i-split-a-column-of-tuples-in-a-pandas-dataframe.
df_2[['Subject', 'Experimenter']] = pd.DataFrame(df_2['SubjExp'].tolist(), index=df_2.index)
df_2 = df_2.drop('SubjExp', axis=1)

matekus
  • 778
  • 3
  • 14