Counting same entries per group in python

Question

I have a dataframe in the following form:

group base height weight size
0      A     10     5     M
0      A     20     5     M
1      A     10     10    S
2      A      5      5    L

How can I get a matrix, which counts the similiar entries by group? The output looks like this:

compare  base height weight size
0,1        3/3  2/3   2/3   2/3
0,2        3/3  0/3   3/3   2/3
1,2        2/2  0/2   0/2   0/2

You might solicit a better answer if you post your code as well :) — Jaap Joris Vens, Aug 19 '20 at 13:30
Only solution I have in mind so far is to work column per column and group by group, that would fill probably one page here, that would lead to some confusion. — PV8, Aug 19 '20 at 13:33
shape is: (117764, 39), but i probably want to compare only some columns out of it — PV8, Aug 19 '20 at 13:50

score 1 · Accepted Answer · answered Mar 21 '21 at 08:08

at the core of the solution most frequent values
use itertools.combinations to get valid pairs of groups
compare most frequent to each row in combination of groups. sum() truth matrix to find number of matches
rest is formating

df = pd.read_csv(io.StringIO("""group base height weight size
0      A     10     5     M
0      A     20     5     M
1      A     10     10    S
2      A      5      5    L"""), sep="\s+")

# columns we're working with
cols = [c for c in df.columns if c!= "group"]

# iterate over combinations of groups
dfx = pd.DataFrame()
for gp in itertools.combinations(df.group.unique(), 2):
    dfg = df.loc[df.group.isin(gp),cols]
    dfx = pd.concat([dfx, 
                     (dfg == dfg.value_counts().index[0])
                     .sum().to_frame().T.assign(gs=len(dfg), compare=",".join(str(e) for e in gp))
                    ])
# rebase 1 as 0 for comparisons
dfx = dfx.reset_index(drop=True).replace(1,0).astype(str)
# format as required
dfx.loc[:,cols] = dfx[cols].apply(lambda x: x+" / " +dfx["gs"])

dfx.drop(columns="gs")

	base	height	weight	size	compare
0	3 / 3	2 / 3	2 / 3	2 / 3	0,1
1	3 / 3	0 / 3	3 / 3	0 / 3	0,2
2	2 / 2	0 / 2	0 / 2	0 / 2	1,2

Counting same entries per group in python

1 Answers1