Pandas - Creating Difference Matrix from Data Frame

Question

I'm trying to create a matrix to show the differences between the rows in a Pandas data frame.

import pandas as pd

data = {'Country':['GB','JP','US'],'Values':[20.2,-10.5,5.7]}
df = pd.DataFrame(data)

I would like this:

  Country  Values
0      GB    20.2
1      JP   -10.5
2      US     5.7

To become something like this (differences going vertically):

  Country     GB     JP     US
0      GB    0.0  -30.7   14.5
1      JP   30.7    0.0   16.2
2      US   14.5  -16.2    0.0

Is this achievable with built-in function or would I need to build a loop to get the desired output? Thanks for your help!

ayhan · Accepted Answer · 2017-09-17T17:27:03.037

14

This is a standard use case for numpy's broadcasting:

df['Values'].values - df['Values'].values[:, None]
Out: 
array([[  0. , -30.7, -14.5],
       [ 30.7,   0. ,  16.2],
       [ 14.5, -16.2,   0. ]])

We access the underlying numpy array with the values attribute and [:, None] introduces a new axis so the result is two dimensional.

You can concat this with your original Series:

arr = df['Values'].values - df['Values'].values[:, None]
pd.concat((df['Country'], pd.DataFrame(arr, columns=df['Country'])), axis=1)
Out: 
  Country    GB    JP    US
0      GB   0.0 -30.7 -14.5
1      JP  30.7   0.0  16.2
2      US  14.5 -16.2   0.0

The array can also be generated with the following, thanks to @Divakar:

arr = np.subtract.outer(*[df.Values]*2).T

Here we are calling .outer on the subtract ufunc and it applies it to all pair of its inputs.

edited Sep 17 '17 at 17:27

answered Sep 17 '17 at 17:10

ayhan

70,170
20
182
203

5

Just for variety or funsake : `np.subtract.outer(*[df.Values]*2)`. – Divakar Sep 17 '17 at 17:13
Hmm seems like the order won't render same output as desired. So, negating it would be one way as noted by jezrael. – Divakar Sep 17 '17 at 17:22
1

So, just transpose it afterwards : `np.subtract.outer(*[df.Values]*2).T` to avoid having the same answer :) – Divakar Sep 17 '17 at 17:24
@Divakar - transpose here is same as negating? – jezrael Sep 17 '17 at 17:25
@Divakar I put a minus sign after your comment. That would also work right? – ayhan Sep 17 '17 at 17:25
1

Transpose works because its pairwise subtractions. Minus would work too. But transpose might be .. even better (not tested), as its just a view. – Divakar Sep 17 '17 at 17:26
Needed something similar, and trying the Numpy for kicks, but get `NotImplementedError`: `if method == "outer":` ... `# GH#27198`; Looks like something broke. – xtian Dec 25 '22 at 01:54

score 3 · Answer 2 · answered Sep 17 '17 at 17:17

I try improve Divakar comment:

a = np.column_stack([df['Country'], np.subtract.outer(*[-df.Values]*2)])

df = pd.DataFrame(a, columns=['Country'] + df['Country'].tolist())
print (df)
  Country    GB    JP    US
0      GB     0 -30.7 -14.5
1      JP  30.7     0  16.2
2      US  14.5 -16.2     0

BENY · Answer 3 · 2017-09-17T17:36:12.017

Option 1

from itertools import product
import pandas as pd
DF=pd.DataFrame(list(product(df.Country, df.Country)), columns=['l1', 'l2'])
df=df.set_index('Country')
DF['v1']=DF.l1.map(df['Values'])
DF['v2']=DF.l2.map(df['Values'])
DF['DIFF']=DF['v2']-DF['v1']
DF.pivot(index='l1', columns='l2', values='DIFF').fillna(0).rename_axis(None).rename_axis(None,1)
Out[94]: 
      GB    JP    US
GB   0.0 -30.7 -14.5
JP  30.7   0.0  16.2
US  14.5 -16.2   0.0

Option 2 using apply

A=df['Values'].apply(lambda x : df['Values']-x)
A.columns=df.Country
A['Country']=df.Country


A
Out[124]: 
Country    GB    JP    US Country
0         0.0 -30.7 -14.5      GB
1        30.7   0.0  16.2      JP
2        14.5 -16.2   0.0      US

Pandas - Creating Difference Matrix from Data Frame

3 Answers3

Linked

Related