Questions tagged [drop-duplicates]

questions related to removing (or dropping) unwanted duplicate values

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about removing unwanted duplicates.

See also:

144 questions
252
votes
8 answers

Drop all duplicate rows across multiple columns in Python Pandas

The pandas drop_duplicates function is great for "uniquifying" a dataframe. I would like to drop all rows which are duplicates across a subset of columns. Is this possible? A B C 0 foo 0 A 1 foo 1 A 2 foo 1 B 3 bar 1 A As an…
Jamie Bull
  • 12,889
  • 15
  • 77
  • 116
51
votes
4 answers

Pandas drop_duplicates method not working on dataframe containing lists

I am trying to use drop_duplicates method on my dataframe, but I am getting an error. See the following: error: TypeError: unhashable type: 'list' The code I am using: df = db.drop_duplicates() My DB is huge and contains strings, floats, dates,…
SLack A
  • 577
  • 1
  • 6
  • 7
12
votes
2 answers

Keeping the last N duplicates in pandas

Given a dataframe: >>> import pandas as pd >>> lol = [['a', 1, 1], ['b', 1, 2], ['c', 1, 4], ['c', 2, 9], ['b', 2, 10], ['x', 2, 5], ['d', 2, 3], ['e', 3, 5], ['d', 2, 10], ['a', 3, 5]] >>> df = pd.DataFrame(lol) >>> df.rename(columns={0:'value',…
alvas
  • 115,346
  • 109
  • 446
  • 738
9
votes
4 answers

Pandas - Opposite of drop duplicates, keep first

I'm familiar with how to drop duplicate rows, and then using the parameter of first, last, none. Nothing too complicated with that and there's plenty of examples (ie here). However, what I'm looking for is there a way to find the duplicates, but…
chitown88
  • 27,527
  • 4
  • 30
  • 59
7
votes
2 answers

How to drop duplicate data with different column names in pandas?

I have a DataFrame with columns with duplicate data with different names: In[1]: df Out[1]: X1 X2 Y1 Y2 0.0 0.0 6.0 6.0 3.0 3.0 7.1 7.1 7.6 7.6 1.2 1.2 I know .drop(columns = ) exists but is there a way more efficient way to…
ahnnni
  • 85
  • 6
7
votes
4 answers

Drop duplicate list elements in column of lists

This is my dataframe: pd.DataFrame({'A':[1, 3, 3, 4, 5, 3, 3], 'B':[0, 2, 3, 4, 5, 6, 7], 'C':[[1,4,4,4], [1,4,4,4], [3,4,4,5], [3,4,4,5], [4,4,2,1], [1,2,3,4,], [7,8,9,1]]}) I want to get set\drop duplicate values of…
matan
  • 451
  • 4
  • 12
6
votes
3 answers

Pandas drop_duplicates. Keep first AND last. Is it possible?

I have this dataframe and I need to drop all duplicates but I need to keep first AND last values For example: 1 0 2 0 3 0 4 0 output: 1 0 4 0 I tried df.column.drop_duplicates(keep=("first","last")) but it doesn't…
bitmover
  • 97
  • 2
  • 8
6
votes
2 answers

Is there any faster alternative to col.drop_duplicates()?

I am trying to remove duplicates data in my dataframe (csv) and get a separate csv to show the unique answers of each column. The problem is that my code has been running for a day (22 Hours to be exact) I´m open to some other suggestions. My data…
AOJ keygen
  • 103
  • 1
  • 8
5
votes
2 answers

Drop duplicate if the value in another column is null - Pandas

What I have: df Name |Vehicle Dave |Car Mark |Bike Steve|Car Dave | Steve| I want to drop duplicates from the Name column but only if the corresponding value in Vehicle column is null. I know I can use df.dropduplicates(subset=['Name']) with…
4
votes
2 answers

Check if pandas row is unique, when order is not considered

I wondered if there is a way to check and then drop certain rows which are not unique? My data frame looks something like this: ID1 ID2 weight 0 2 4 0.5 1 3 7 0.8 2 4 2 0.5 3 7 3 0.8 4 8 2 0.5 5 3 8 …
msa
  • 693
  • 6
  • 21
3
votes
3 answers

how to find list of columns with same values in a dataframe in python

i am trying to find list of columns in a data frame with same values in columns. there is a package in R whichAreInDouble, trying implement that in python. df = a b c d e f g h i 1 2 3 4 1 2 3 4 5 2 3 4 5 2 3 4 5 6 3 4 5 6 3 4 5 6 7 it…
Vivek Sthanam
  • 63
  • 1
  • 6
3
votes
2 answers

Drop unordered duplicates across separate columns

I am trying to return a df where duplicate values have been removed. I have tried to use drop.duplicates() but the values in the columns which have been subset aren't ordered. As in, the values are duplicates but they aren't in the same order. For…
jonboy
  • 415
  • 4
  • 14
  • 45
2
votes
2 answers

Removing Duplicates Based on Other Cell Value

forgive me if this is a thick question: I have a table of training completions e.g. User Training Course Status 1 Course 1 Complete 1 Course 1 Complete 1 Course 1 Incomplete 1 Course 2 Complete 1 Course 3 Incomplete My source…
G P
  • 23
  • 3
2
votes
3 answers

polars equivalent of pandas groupby.apply(drop_duplicates)

I am new to polars and I wonder what is the equivalent of pandas groupby.apply(drop_duplicates) in polars. Here is the code snippet I need to translate : import pandas as pd GROUP = list('123231232121212321') OPERATION =…
2
votes
2 answers

Pandas drop duplicates based on one group and keep the last value

I have a dataframe: import pandas as pd data = pd.DataFrame({"col1": ["a", "a", "a", "a", "a", "a"], "col2": [0,0,0,1,1, 1], "col3": [1,2,3,4,5, 6]}) data col1 col2 col3 0 a 0 1 1 a 0 …
Ailurophile
  • 2,552
  • 7
  • 21
  • 46
1
2 3
9 10