58

Given a square pandas DataFrame of the following form:

   a  b  c
a  1 .5 .3
b .5  1 .4
c .3 .4  1

How can the upper triangle be melted to get a matrix of the following form

 Row     Column    Value
  a        a       1
  a        b       .5 
  a        c       .3
  b        b       1
  b        c       .4
  c        c       1 

#Note the combination a,b is only listed once.  There is no b,a listing     

I'm more interested in an idiomatic pandas solution, a custom indexer would be easy enough to write by hand...

Thank you in advance for your consideration and response.

Ramón J Romero y Vigil
  • 17,373
  • 7
  • 77
  • 125

3 Answers3

78

First I convert lower values of df to NaN by where and numpy.triu and then stack, reset_index and set column names:

import numpy as np

print df
     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

print np.triu(np.ones(df.shape)).astype(np.bool)
[[ True  True  True]
 [False  True  True]
 [False False  True]]

df = df.where(np.triu(np.ones(df.shape)).astype(np.bool))
print df
    a    b    c
a   1  0.5  0.3
b NaN  1.0  0.4
c NaN  NaN  1.0

df = df.stack().reset_index()
df.columns = ['Row','Column','Value']
print df

  Row Column  Value
0   a      a    1.0
1   a      b    0.5
2   a      c    0.3
3   b      b    1.0
4   b      c    0.4
5   c      c    1.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 3
    I the only thing to watch out for is if you have any `NaN` values that you want to preserve in the upper triangle (`stack` will drop them all). You might have to explicitly construct the multi-index and then reindex if that is the case. – Alex Riley Dec 22 '15 at 16:47
  • @jezrael how would you go back from the last `df` to the triangular one? I've built the triangular matrix, converted to long, processed, and now I want to have it back to triangular, but some `NA`s go to the upper triangular – Sos May 08 '18 at 13:11
  • @Sosi - I think need pivot like `df = df.pivot('Row', 'Column', 'Value')` – jezrael May 08 '18 at 13:17
  • @jezrael thank you. however, this still yields a non-triangular matrix. Maybe I will create a new thread and ping you – Sos May 08 '18 at 13:20
  • @Sosi - Yes, because for me it working nice, so problem is something else. – jezrael May 08 '18 at 13:21
  • @jezrael I'll be honest: I feel dumb. With the working example, it was working fine, but not with my real case. However, I noticed now it was a matter of index/column order... sorry for having disturbed you. I will deleted my latest comments – Sos May 08 '18 at 13:29
  • 1
    @Sosi - No problem :) `pivot` always sorting data :) – jezrael May 08 '18 at 13:29
  • @AlexRiley using unstack instead seems to preserve the NaNs – Emanuel Jan 08 '20 at 13:00
  • What if I directly stack and pick first half rows ? – Divyanshu Srivastava Oct 12 '22 at 13:43
14

Building from solution by @jezrael, boolean indexing would be a more explicit approach:

import numpy
from pandas import DataFrame

df = DataFrame({'a':[1,.5,.3],'b':[.5,1,.4],'c':[.3,.4,1]},index=list('abc'))
print df,'\n'
keep = np.triu(np.ones(df.shape)).astype('bool').reshape(df.size)
print df.stack()[keep]

output:

     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0 

a  a    1.0
   b    0.5
   c    0.3
b  b    1.0
   c    0.4
c  c    1.0
dtype: float64
Matthew Davis
  • 497
  • 4
  • 12
  • 3
    Very useful in combination with [pandas.DataFrame.corr](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.corr.html) – Shadi Mar 21 '18 at 06:39
2

Also buildin on solution by @jezrael, here's a version adding a function to do the inverse operation (from xy to matrix), usefull in my case to work with covariance / correlation matrices.

def matrix_to_xy(df, columns=None, reset_index=False):
    bool_index = np.triu(np.ones(df.shape)).astype(bool)
    xy = (
        df.where(bool_index).stack().reset_index()
        if reset_index
        else df.where(bool_index).stack()
    )
    if reset_index:
        xy.columns = columns or ["row", "col", "val"]
    return xy


def xy_to_matrix(xy):
    df = xy.pivot(*xy.columns).fillna(0)
    df_vals = df.to_numpy()
    df = pd.DataFrame(
        np.triu(df_vals, 1) + df_vals.T, index=df.index, columns=df.index
    )
    return df
df = pd.DataFrame(
    {"a": [1, 0.5, 0.3], "b": [0.5, 1, 0.4], "c": [0.3, 0.4, 1]},
    index=list("abc"),
)
print(df)
xy = matrix_to_xy(df, reset_index=True)
print(xy)
mx = xy_to_matrix(xy)
print(mx)

output:

     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

  row col  val
0   a   a  1.0
1   a   b  0.5
2   a   c  0.3
3   b   b  1.0
4   b   c  0.4
5   c   c  1.0

row    a    b    c
row
a    1.0  0.5  0.3
b    0.5  1.0  0.4
c    0.3  0.4  1.0
bravhek
  • 155
  • 5