2

df:

         0         1         2 
0 0.0481948 0.1054251 0.1153076 
1 0.0407258 0.0890868 0.0974378 
2 0.0172071 0.0376403 0.0411687
etc.

I would like to remove all values in which the x and y titles/values of the dataframe are equal, therefore, my expected output would be something like:

         0         1         2 
0 NaN       0.1054251 0.1153076 
1 0.0407258 NaN       0.0974378 
2 0.0172071 0.0376403 NaN
etc.

As shown, the values of (0,0), (1,1), (2,2) and so on, have been removed/replaced.

I thought of looping through the index as followed:

for (idx, row) in df.iterrows():
    if (row.index) == ???

But don't know where to carry on or whether it's even the right approach

Enigmatic
  • 3,902
  • 6
  • 26
  • 48

3 Answers3

4

You can set the diagonal:

In [11]: df.iloc[[np.arange(len(df))] * 2] = np.nan

In [12]: df
Out[12]:
          0         1         2
0       NaN  0.105425  0.115308
1  0.040726       NaN  0.097438
2  0.017207  0.037640       NaN
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
2

@AndyHayden's answer is really cool and taught me something. However, it depends on iloc and that the array is square and that everything is in the same order.

I generalized the concept here

Consider the data frame df

df = pd.DataFrame(1, list('abcd'), list('xcya'))

df

   x  c  y  a
a  1  1  1  1
b  1  1  1  1
c  1  1  1  1
d  1  1  1  1

Then we use numpy broadcasting and np.where to perform the same fancy index assignment:

ij = np.where(df.index.values[:, None] == df.columns.values)

df.iloc[list(map(list, ij))] = 0

df

   x  c  y  a
a  1  1  1  0
b  1  1  1  1
c  1  0  1  1
d  1  1  1  1
piRSquared
  • 285,575
  • 57
  • 475
  • 624
0

n is number of rows/columns

df.values[[np.arange(n)]*2] = np.nan

or

np.fill_diagonal(df.values, np.nan)

see https://stackoverflow.com/a/24475214/

ehacinom
  • 8,070
  • 7
  • 43
  • 65