2

I have a square matrix as a dataframe in pandas. It should be symmetric, and nearly is, except for a few missing values that I filled with 0. I want to use the fact that it should be symmetric to fill the missing values, by taking the max of the absolute value over df.ix[x,y] and df.ix[y,x]. I.e.:

df = pd.DataFrame({'b': {'b': 1, 'a': 0,'c':-1}, 'a': {'b': 1, 'a': 1,'c':0},'c':{'c':1,'a':0,'b':0}})

>>> df
   a  b  c
a  1  0  1
b  1  1  0
c  1 -1  1

should become:

>>> df
   a  b  c
a  1  1  1
b  1  1 -1
c  1 -1  1

At first I thought of using a simple applymap with a function something like:

def maxSymmetric(element):
     if abs(element) > df.T.ix[element.column,element.row]:
          return element
     else return df.T.ix[element.column,element.row]

But there doesn't seem to be a way to call the indices of an element within a function inside applymap (see related).

So then I tried making a multilevel dataframe of the original matrix and its transpose:

    pd.concat([df,df.T],axis=0,keys=['o','t'])
     a  b  c
o a  1  0  1
  b  1  1  0
  c  1 -1  1
t a  1  1  1
  b  0  1 -1
  c  1  0  1

Now I want to extract the correct (nonzero, if available) element from either 'o' or 't', for each element, using a similar function as above. But I'm not very experienced with multiindexing, and I can't figure out how to use applymap here, or if I should be using something else.

Suggestions?

Community
  • 1
  • 1
andbeonetraveler
  • 693
  • 3
  • 11
  • 25

1 Answers1

2

I think you can first convert df to numpy array, use numpy solution and last create DataFrame with constructor:

a = df.values
print (pd.DataFrame(data=a + a.T - np.diag(a.diagonal()), 
                    columns=df.columns,
                    index=df.index))

   a  b  c
a  1  1  2
b  1  1 -1
c  2 -1  1

EDIT by comment:

print (df + df.T - df[df==df.T].fillna(0))
     a    b    c
a  1.0  1.0  1.0
b  1.0  1.0 -1.0
c  1.0 -1.0  1.0
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Sorry, I should have clarified that most of the values are symmetric already (I edited the post so that (a,c) covers this case). But, based on your suggestion, I think this will work: df + df.T - df[df==df.T].fillna(0) If you want to edit your answer, I'll accept it :) – andbeonetraveler Jun 21 '16 at 21:55
  • I add your suggestion, but output is little different. Is it ok? – jezrael Jun 21 '16 at 21:59
  • Yeah as far as I'm concerned that does what I want--basically the same idea just without converting to numpy first. Can always change dtypes manually if that's a problem. Thanks! I edited it though, to reflect the changes I made to the original post. – andbeonetraveler Jun 21 '16 at 22:04