I want to return unique rows from multiple columns in a df
. The issue is I want to include the same set of values if they don't appear in the previous row. This is a little hard to explain so I'll display it;
df = pd.DataFrame({
'Time' : ['2019-08-02 09:50:10.1','2019-08-02 09:50:10.2','2019-08-02 09:50:10.3','2019-08-02 09:50:10.4','2019-08-02 09:50:10.5','2019-08-02 09:50:10.6','2019-08-02 09:50:10.7','2019-08-02 09:50:10.8','2019-08-02 09:50:10.9','2019-08-02 09:50:11.0'],
'Code1' : ['A','A','B','B','C','C','A','A','B','B'],
'Code2' : ['B','B','A','A','B','B','B','B','A','A'],
'Code3' : [np.nan,np.nan,'C','C','A','A','C','C','C','C'],
})
df = df[df.iloc[:, 1:].shift().ne(df.iloc[:, 1:]).any(axis=1)].reset_index(drop = True)
Intended Output:
Time Code1 Code2 Code3
0 2019-08-02 09:50:10.1 A B NaN
1 2019-08-02 09:50:10.3 B A C
2 2019-08-02 09:50:10.5 C B A
3 2019-08-02 09:50:10.7 A B C
4 2019-08-02 09:50:10.9 B A C