1

I'd like to find duplicated rows in a Pandas dataframe. When I use df.duplicated() it returns the following error:

TypeError: unhashable type: 'list'

To resolve this error, I tried the following:

df2 = df[df.applymap(lambda x: x[0] if isinstance(x, list) else x).duplicated()]

However, I receive a new but similar error: "TypeError: unhashable type: 'dict'"

Does anyone know how I can use applymap lambda with two conditions? (the conditions are if isinstance(x, list) OR if isinstance(x, dict))?

UPDATE: Here is the sample of the data (first few rows of the df): enter image description here

Thank you!

mOna
  • 2,341
  • 9
  • 36
  • 60
  • Are you sure that df is a pandas dataframe type? check for type(df). I normally use df.loc[df.duplicated(subset=[X,Y,Z]), ]. I do not have much experience with applymap, but with apply, I also use axis, apply(lambda x: f(x), axis=1). – Rockbar Mar 02 '22 at 06:58
  • Not what you're asking, but you could use `.applymap` to transform your list into tuples, a hashable type that will allow you to use `.duplicated()` – aaossa Mar 02 '22 at 06:59
  • @Rockbar: yes it is a dataframe (returns pandas.core.frame.DataFrame). It's a twitter data retrieved in json format (nested json) – mOna Mar 02 '22 at 07:00
  • Can you provide a sample of the data? – rehaqds Mar 02 '22 at 07:43
  • @rehaqds: sure. I updated my post. – mOna Mar 02 '22 at 18:11
  • Does this answer your question? [How to overcome TypeError: unhashable type: 'list'](https://stackoverflow.com/questions/13675296/how-to-overcome-typeerror-unhashable-type-list) – Ulrich Eckhardt Mar 02 '22 at 18:32
  • @UlrichEckhardt not really, because I'd like to know how to use two conditions with `applymap lambda` (to include both `list` and `dict` types in the condition) – mOna Mar 02 '22 at 23:30
  • The problem is probably the dict in the list. I would write a function to tackle the "referenced_tweets" first. Since you know the structure there, you can spare the if else question, and write a specific unwrapper. – Rockbar Mar 03 '22 at 11:11

1 Answers1

1

With the apply you create a column with dictionary in it, the issue is that the method duplicated use hashing to compare values and in Python dictionaries (and lists) are not hashable.
But strings are hashable so you could add .astype(str):

df2 = df.applymap(lambda x: x[0] if isinstance(x, list) else x).astype(str).duplicated()
rehaqds
  • 414
  • 2
  • 6