0

I have a data set in which I used to remove duplicates via excel. it has three columns - mobile number, first_order_date, and partner_name. My logic is to sort the users according to first_order_date ASC, and then remove duplicates on mobile_number. Now the rows have increased to 1.4M and I'm doing the same in Pandas. The goal is to get unique users based on by which partner did they place an order first. Output can be in any format (asc,desc).

Which method should I follow to replicate the above process in Pandas same as I did in excel? -

df = df.sort_values(by = ['first_order_date'], ascending = True)
df = df.drop_duplicates(subset = ['mobile_number'], keep = 'first')

Or by this ->

df = df.sort_values('first_order_date').groupby(['mobile_number']).head(1)

I am having doubts about the "Group by" in the above, but my senior suggested this.

"Adding a few lines of code here as this question is being closed due to some stack-overflow duplicity error."

Please help!!

Prateek
  • 11
  • 1
  • 5

0 Answers0