0

I have a problem. I want to remove all rows where customerId and fromDate have the same value. For example. The row 1 and 4 are the same. So row 4 should be removed. But how could I find the row what is the same?

Dataframe

   customerId    fromDate
0           1  2021-02-22
1           1  2021-03-18
2           1  2021-03-22
3           1        None
4           1  2021-03-18
5           3  2021-02-22
6           3  2021-02-22

Code

import pandas as pd


d = {'customerId': [1, 1, 1, 1, 1, 3, 3],
     'fromDate': ['2021-02-22', '2021-03-18', '2021-03-22', None, '2021-03-18', '2021-02-22', '2021-02-22']
    }
df = pd.DataFrame(data=d)
print(df)

What I want

   customerId    fromDate
0           1  2021-02-22
1           1  2021-03-18
2           1  2021-03-22
3           1        None
5           3  2021-02-22

# Removed
# 4           1  2021-03-18
# 6           3  2021-02-22
Test
  • 571
  • 13
  • 32

2 Answers2

1

You can use :

df.drop_duplicates()

Which drop all the duplicate rows https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html

0

IIUC You can use drop_duplicates to remove duplicates

df.drop_duplicates(inplace = True)
ArchAngelPwn
  • 2,891
  • 1
  • 4
  • 17
  • see the [second answer here](https://stackoverflow.com/questions/43893457/understanding-inplace-true-in-pandas) using `inplace=True` is considered harmful. – Umar.H May 24 '22 at 14:52
  • 1
    So if I understood the post correctly the answer should simply be df = df.drop_duplicates() – ArchAngelPwn May 24 '22 at 15:03
  • exactly - `inplace` I'm sure will go away in the future, or at least come with a bigger warning sign. – Umar.H May 24 '22 at 15:20
  • 1
    Thats great info thank you for letting me know, I'll stop using it in my coding going forward! – ArchAngelPwn May 24 '22 at 15:42