sort values and remove duplicates in pandas

Asked Dec 13 '22 at 10:10

Active Dec 13 '22 at 10:12

Viewed 16 times

I have a data set in which I used to remove duplicates via excel. it has three columns - mobile number, first_order_date, and partner_name. My logic is to sort the users according to first_order_date ASC, and then remove duplicates on mobile_number. Now the rows have increased to 1.4M and I'm doing the same in Pandas. The goal is to get unique users based on by which partner did they place an order first. Output can be in any format (asc,desc).

Which method should I follow to replicate the above process in Pandas same as I did in excel? -

df = df.sort_values(by = ['first_order_date'], ascending = True)
df = df.drop_duplicates(subset = ['mobile_number'], keep = 'first')

Or by this ->

df = df.sort_values('first_order_date').groupby(['mobile_number']).head(1)

I am having doubts about the "Group by" in the above, but my senior suggested this.

"Adding a few lines of code here as this question is being closed due to some stack-overflow duplicity error."

Please help!!

edited Dec 13 '22 at 10:12

asked Dec 13 '22 at 10:10

Prateek

sorting is expensive, the most efficient would be to use `groupby.idxmin` see duplicate – mozway Dec 13 '22 at 10:11

sort values and remove duplicates in pandas

0 Answers0