Aggregate symmetric pairs if they exist in Pandas

Question

Given a dataframe

data = {
        "col1": ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'D'], 
        "col2": ['B', 'C', 'A', 'C', 'D', 'B', 'E', 'A',],
        "count": [3, 7, 12, 2, 8, 2, 5, 9]
        }
df = pd.DataFrame(data=data)

col1 col2 count
 A    B     3
 A    C     7
 B    A     12
 B    C     2
 C    D     8
 C    B     2
 D    E     5
 D    A     9

I want to aggregate all the rows that have symmetric pairs while maintaining the rows that don't. Therefore, the output would be a new dataframe

col1 col2 count
 A    B     15
 A    C     7
 B    C     4
 C    D     8
 D    E     5
 D    A     9

I have looked at Aggregate symmetric pairs pandas and Find symmetric pairs quickly in numpy but neither of them particularly help me.

@QuangHoang When I do that, I get an error, so it doesn't work. — DrakeMurdoch, Feb 24 '20 at 21:06

score 3 · Accepted Answer · answered Feb 24 '20 at 21:08

Actually, the given linked used apply, which generally is slow. Here's a solution with np.sort:

s = np.sort(df[['col1','col2']], axis=1)
df.groupby([s[:,0],s[:,1]])['count'].sum().reset_index()

Output:

  level_0 level_1  count
0       A       B     15
1       A       C      7
2       A       D      9
3       B       C      4
4       C       D      8
5       D       E      5

score 2 · Answer 2 · answered Feb 24 '20 at 21:05

Some small modifications to the answer you linked does what you want:

df1 = df[['col1', 'col2']].apply(sorted, 1, result_type = "expand").rename(columns = {0:'col1', 1:'col2'})

df.groupby([df1.col1, df1.col2]).sum().reset_index()

  col1 col2  count
0    A    B     15
1    A    C      7
2    A    D      9
3    B    C      4
4    C    D      8
5    D    E      5

Aggregate symmetric pairs if they exist in Pandas

2 Answers2