Python Pandas: DataFrame modification with diagnal value = 0

Question

I have a Pandas Dataframe question. I have a df with index=column. It looks like below.

df:
       DNA      Cat2                                   
       Item     A   B   C   D   E   F   F   H   I   J  .......
DNA   Item
Cat2  A         812 62  174 0   4   46  46  7   2   15 
      B         62  427 27  0   0   12  61  2   4   11 
      C         174 27  174 0   0   13  22  5   2   4  
      D         0   0   0   0   0   0   0   0   0   0  
      E         4   0   0   0   130 10  57  33  4   5  
      F         46  12  13  0   10  187 4   5   0   0 
      ......

Another words, df=df.transpose(). All I want to do is find pandas (or numpy for df.values())function to delete index=column values. My ideal output would be below.

df:
       DNA      Cat2                                   
       Item     A   B   C   D   E   F   F   H   I   J  .......
DNA   Item
Cat2  A         0   62  174 0   4   46  46  7   2   15 
      B         62  0   27  0   0   12  61  2   4   11 
      C         174 27  0   0   0   13  22  5   2   4  
      D         0   0   0   0   0   0   0   0   0   0  
      E         4   0   0   0   0   10  57  33  4   5  
      F         46  12  13  0   10  0   4   5   0   0 
      ......

Is there a python function that makes this step very fast? I tried for loop with df.iloc[i,i]=0 but since my dataset is ver big, it takes long time to finish. Thanks in advance!

Don't think so. You can also try the answers there. Other people will mark it as duplicate they think it is appropriate. — Tai, Jan 11 '18 at 06:34

piRSquared · Accepted Answer · 2018-01-11T06:26:03.360

Setup

np.random.seed([3,1415])
i = pd.MultiIndex.from_product(
    [['Cat2'], list('ABCDEFGHIJ')],
    names=['DNA', 'Item']
)

a = np.random.randint(5, size=(10, 10))
df = pd.DataFrame(a + a.T + 1, i, i)

df

DNA       Cat2                           
Item         A  B  C  D  E  F  G  H  I  J
DNA  Item                                
Cat2 A       1  6  6  7  7  7  4  4  8  2
     B       6  1  3  6  1  6  6  4  8  5
     C       6  3  9  8  9  6  7  8  4  9
     D       7  6  8  1  6  9  4  5  4  3
     E       7  1  9  6  9  7  3  7  2  6
     F       7  6  6  9  7  9  3  4  6  6
     G       4  6  7  4  3  3  9  4  5  5
     H       4  4  8  5  7  4  4  5  4  5
     I       8  8  4  4  2  6  5  4  9  7
     J       2  5  9  3  6  6  5  5  7  3

Option 1
Simplest way is to multiply by 1 less the identity

df * (1 - np.eye(len(df), dtype=int))

DNA       Cat2                           
Item         A  B  C  D  E  F  G  H  I  J
DNA  Item                                
Cat2 A       0  6  6  7  7  7  4  4  8  2
     B       6  0  3  6  1  6  6  4  8  5
     C       6  3  0  8  9  6  7  8  4  9
     D       7  6  8  0  6  9  4  5  4  3
     E       7  1  9  6  0  7  3  7  2  6
     F       7  6  6  9  7  0  3  4  6  6
     G       4  6  7  4  3  3  0  4  5  5
     H       4  4  8  5  7  4  4  0  4  5
     I       8  8  4  4  2  6  5  4  0  7
     J       2  5  9  3  6  6  5  5  7  0

Option 2
However, we can also use pd.DataFrame.mask with np.eye. Masking is nice because it doesn't have to be numeric and it will still work.

df.mask(np.eye(len(df), dtype=bool), 0)

DNA       Cat2                           
Item         A  B  C  D  E  F  G  H  I  J
DNA  Item                                
Cat2 A       0  6  6  7  7  7  4  4  8  2
     B       6  0  3  6  1  6  6  4  8  5
     C       6  3  0  8  9  6  7  8  4  9
     D       7  6  8  0  6  9  4  5  4  3
     E       7  1  9  6  0  7  3  7  2  6
     F       7  6  6  9  7  0  3  4  6  6
     G       4  6  7  4  3  3  0  4  5  5
     H       4  4  8  5  7  4  4  0  4  5
     I       8  8  4  4  2  6  5  4  0  7
     J       2  5  9  3  6  6  5  5  7  0

Option 3
In the event the columns and indices are not identical, OR the are out of order. We can use equality to tell us where to mask.

d = df.iloc[::-1]
d.mask(d.index == d.columns.values[:, None], 0)

DNA       Cat2                           
Item         A  B  C  D  E  F  G  H  I  J
DNA  Item                                
Cat2 J       2  5  9  3  6  6  5  5  7  0
     I       8  8  4  4  2  6  5  4  0  7
     H       4  4  8  5  7  4  4  0  4  5
     G       4  6  7  4  3  3  0  4  5  5
     F       7  6  6  9  7  0  3  4  6  6
     E       7  1  9  6  0  7  3  7  2  6
     D       7  6  8  0  6  9  4  5  4  3
     C       6  3  0  8  9  6  7  8  4  9
     B       6  0  3  6  1  6  6  4  8  5
     A       0  6  6  7  7  7  4  4  8  2

I understand all the diagnal values are multiply by zero, but would other values be effected by multiply? — EJ Kang, Jan 11 '18 at 06:14
Wow. thanks for all the options. How would you rank as speed wise? — EJ Kang, Jan 11 '18 at 06:35
Not sure. None of these were intended to be super-duper fast. But there isn't anything really slowing them down. I suspect they are all similar. Let me know if you need faster. If so, I'll post a tailored solution for speed. — piRSquared, Jan 11 '18 at 06:38

Python Pandas: DataFrame modification with diagnal value = 0

1 Answers1