1

The following command will replace all values for matching row to None.

ndf.iloc[np.where(ndf.path3=='sys_bck_20190101.tar.gz')] = np.nan

What I really need to do is to replace the value of a single column called path4 if it matches with column path3. This does not work:

ndf.iloc[np.where(ndf.path3==ndf.path4), ndf.path3] = np.nan

Update:

There is a pandas method "fillna" that can be used with axis = 'columns'. Is there a similar method to write "NA" values to the duplcate columns?

I can do this, but it does not look like pythonic.

ndf.loc[ndf.path1==ndf.path2, 'path1'] = np.nan
ndf.loc[ndf.path2==ndf.path3, 'path2'] = np.nan
ndf.loc[ndf.path3==ndf.path4, 'path3'] = np.nan
ndf.loc[ndf.path4==ndf.filename, 'path4'] = np.nan

Update 2

Let me explain the issue:

Assuming this dataframe:

ndf = pd.DataFrame({

         'path1':[4,5,4,5,5,4],
         'path2':[4,5,4,5,5,4],
         'path3':list('abcdef'),
         'path4':list('aaabef'),
        'col':list('aaabef')
})

The expected results :

0   NaN 4.0 NaN NaN a
1   NaN 5.0 b   NaN a
2   NaN 4.0 c   NaN a
3   NaN 5.0 d   NaN b
4   NaN 5.0 NaN NaN e
5   NaN 4.0 NaN NaN f

As you can see this is reverse of fillna. And I guess there is no easy way to do this in pandas. I have already mentioned the commands I can use. I will like to know if there is a better way to achieve this.

shantanuo
  • 31,689
  • 78
  • 245
  • 403

1 Answers1

1

Use:

for c1, c2 in zip(ndf.columns, ndf.columns[1:]):
    ndf.loc[ndf[c1]==ndf[c2], c1] = np.nan

print (ndf)
   path1  path2 path3 path4 col
0    NaN    4.0   NaN   NaN   a
1    NaN    5.0     b   NaN   a
2    NaN    4.0     c   NaN   a
3    NaN    5.0     d   NaN   b
4    NaN    5.0   NaN   NaN   e
5    NaN    4.0   NaN   NaN   f
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252