Pandas drop_duplicates. Keep first AND last. Is it possible?

Question

I have this dataframe and I need to drop all duplicates but I need to keep first AND last values

For example:

output:

1     0

4     0

I tried df.column.drop_duplicates(keep=("first","last")) but it doesn't word, it returns

ValueError: keep must be either "first", "last" or False

Does anyone know any turn around for this?

Thanks

kait · Accepted Answer · 2020-07-04T14:30:42.233

13

You could use the panda's concat function to create a dataframe with both the first and last values.

pd.concat([
    df['X'].drop_duplicates(keep='first'),
    df['X'].drop_duplicates(keep='last'),
])

edited Jul 04 '20 at 14:30

answered Jul 03 '20 at 19:56

kait

1,327
9
13

score 3 · Answer 2 · answered Jul 03 '20 at 20:29

you can't drop both first and last... so trick is too concat data frames of first and last.

When you concat one has to handle creating duplicate of non-duplicates. So only concat unique indexes in 2nd Dataframe. (not sure if Merge/Join would work better?)

import pandas as pd

d = {1:0,2:0,10:1, 3:0,4:0}

df = pd.DataFrame.from_dict(d, orient='index', columns=['cnt'])
print(df)

    cnt
1     0
2     0
10    1
3     0
4     0

Then do this:

d1 = df.drop_duplicates(keep=("first"))
d2 = df.drop_duplicates(keep=("last"))
d3 = pd.concat([d1,d2.loc[set(d2.index) - set(d1.index)]])
d3
Out[60]:
cnt
1   0
10  1
4   0

score 2 · Answer 3 · answered Jul 03 '20 at 19:56

Use a groupby on your column named column, then reindex. If you ever want to check for duplicate values in more than one column, you can extend the columns you include in your groupby.

df = pd.DataFrame({'column':[0,0,0,0]})

Input:

df.groupby('column', as_index=False).apply(lambda x: x if len(x)==1 else x.iloc[[0, -1]]).reset_index(level=0, drop=True)

Output:

   column
0       0
3       0

Pandas drop_duplicates. Keep first AND last. Is it possible?

3 Answers3

Linked

Related