Count all pairwise in pandas dataframe

Question

I'm looking for a simple way in pandas to count all pairwise combination between 2 columns. For example:

df

Col1    Col2   Freq
 A        B     1
 B        A     1
 A        C     1
 C        B     1
 C        A     1

Expected output:

Col1    Col2   Freq
 A        B     2
 A        C     2
 C        B     1

thanks

`df[['Col1', 'Col2']] = np.sort(df[['Col1', 'Col2']], axis=1); df.groupby(['Col1', 'Col2']).size()` — user3483203, Nov 20 '19 at 19:10
It's `sum` instead of `size`: `df.groupby(['Col1', 'Col2']).sum().reset_index()`. Post it as an answer @user3483203 ? — Erfan, Nov 20 '19 at 19:14
It's a clear dup of like 100 other posts. It's more a question of which one is the best to link (i.e. the one that clearly avoids a slow solution) — ALollz, Nov 20 '19 at 19:16
[This question](https://stackoverflow.com/questions/58592606/find-symmetric-pairs-quickly-in-numpy/58592764#58592764) replaces `first()` with `sum()`. — Quang Hoang, Nov 20 '19 at 19:23
I think the frozenset answer should be undeleted, it really was a good answer — oppressionslayer, Nov 20 '19 at 20:00

piRSquared · Answer 1 · 2020-02-10T15:25:04.080

1

`np.sort` and `groupby`

The main trick is to sort each row when considering just the columns 'Col1' and 'Col2'. I use some unpacking trickery to get the first column of the sorted thing into x and the second into y.

I could've placed these into there own columns but that isn't necessary as groupby allows to group by external arrays. So I do exactly that.

x, y = np.sort(df[['Col1', 'Col2']].to_numpy(), axis=1).T
df.Freq.groupby([x, y]).sum()

A  B    2
   C    2
B  C    1
Name: Freq, dtype: int64

edited Feb 10 '20 at 15:25

answered Nov 20 '19 at 19:53

piRSquared

285,575
57
475
624

Nice solution that seems to work, but can you explain it? The "groupby([x, y])" in particular really eludes me... – Tickon Feb 10 '20 at 15:20

score 1 · Answer 2 · answered Nov 20 '19 at 21:05

You can try:

df2=(df
  .groupby(
       df[["Col1", "Col2"]]
       .apply(lambda x: str(sorted(x)), axis=1)
       )
  .agg({"Col1": "first", "Col2": "first", "Freq": "sum"})
  .reset_index(drop=True)
)

Output:

  Col1 Col2  Freq
0    A    B     2
1    A    C     2
2    C    B     1

score 0 · Answer 3 · answered Nov 20 '19 at 19:30

While this is clearly a dup of others, I'm not sure if the following approach is mentioned somewhere:

s = (df.groupby(['Col1','Col2'])
       .Freq.sum()
       .unstack(-1, fill_value=0)
    )

np.triu(s) + np.triu(s.T,1)

Output:

array([[0, 2, 2],
       [0, 0, 1],
       [0, 0, 0]], dtype=int64)

Count all pairwise in pandas dataframe

3 Answers3

np.sort and groupby

`np.sort` and `groupby`