The following command will replace all values for matching row to None.
ndf.iloc[np.where(ndf.path3=='sys_bck_20190101.tar.gz')] = np.nan
What I really need to do is to replace the value of a single column called path4 if it matches with column path3. This does not work:
ndf.iloc[np.where(ndf.path3==ndf.path4), ndf.path3] = np.nan
Update:
There is a pandas method "fillna" that can be used with axis = 'columns'. Is there a similar method to write "NA" values to the duplcate columns?
I can do this, but it does not look like pythonic.
ndf.loc[ndf.path1==ndf.path2, 'path1'] = np.nan
ndf.loc[ndf.path2==ndf.path3, 'path2'] = np.nan
ndf.loc[ndf.path3==ndf.path4, 'path3'] = np.nan
ndf.loc[ndf.path4==ndf.filename, 'path4'] = np.nan
Update 2
Let me explain the issue:
Assuming this dataframe:
ndf = pd.DataFrame({
'path1':[4,5,4,5,5,4],
'path2':[4,5,4,5,5,4],
'path3':list('abcdef'),
'path4':list('aaabef'),
'col':list('aaabef')
})
The expected results :
0 NaN 4.0 NaN NaN a
1 NaN 5.0 b NaN a
2 NaN 4.0 c NaN a
3 NaN 5.0 d NaN b
4 NaN 5.0 NaN NaN e
5 NaN 4.0 NaN NaN f
As you can see this is reverse of fillna. And I guess there is no easy way to do this in pandas. I have already mentioned the commands I can use. I will like to know if there is a better way to achieve this.