Pandas cannot compute isin with a duplicate axis

Question

My dataframe is something like this:

             userid           codeassigned         timestamp
15           553938              M1           1499371200000
15390        527638              M2           1599731200000
15389        521638              M2           1399901200000
15388        521638              M3           1439841200000
15387        553938              M4           1499521200000

I have taken a subset of this dataframe (user with latest timestamp) by doing:

df = df.sort_values('timestamp', ascending=False)
mask = df.duplicated('userid')
subset_df = df[~mask]

Now, I want all the rows from main dataframe where (userid, timestamp) are in subset_df (there can be multiple rows with same[userid, timestamp] but with different code assigned); for which I'm doing:

subset_df[['userid', 'timestamp']].isin(df)

However, I'm getting this error:

ValueError: cannot compute isin with a duplicate axis.

Any idea what I'm doing wrong ?

I also faced the error. Cross checking if both the dataframes must not have duplicate column names worked for me. — Pooja Sonkar, Apr 22 '21 at 19:03

jezrael · Accepted Answer · 2019-02-05T06:35:36.173

4

You need merge for inner join with filtered subset:

subset_df = df.loc[~mask, ['userid', 'timestamp']]

df = subset_df.merge(df)

Or:

df = subset_df[['userid', 'timestamp']].merge(df)

edited Feb 05 '19 at 06:35

answered Feb 05 '19 at 06:26

jezrael

822,522
95
1,334
1,252

Cool ! But can you please put some light on why 'isin' is not working for this case..? – Saurabh Verma Feb 05 '19 at 06:37
3

@SaurabhVerma - yes, main problem is [`DataFrame.isin`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html)` test values only in another DataFrame with same index and columns names, here are different index, so error. – jezrael Feb 05 '19 at 06:41
I had a similar problem and the way I thought of `x.isin(y)` was incorrect. I would expect it to mean "Is x in y?", but in fact you should think about it as "Is y in x?" Maybe this was only unclear to me, but it explains why you can flip the arguments of `Dataframe.isin` and see this particular error come and go. – Todd Vanyo Mar 01 '19 at 15:48

Pandas cannot compute isin with a duplicate axis

1 Answers1

Linked