6

I have this dataframe and I need to drop all duplicates but I need to keep first AND last values

For example:

1      0

2     0

3     0

4     0

output:

1     0

4     0

I tried df.column.drop_duplicates(keep=("first","last")) but it doesn't word, it returns

ValueError: keep must be either "first", "last" or False

Does anyone know any turn around for this?

Thanks

bitmover
  • 97
  • 2
  • 8

3 Answers3

13

You could use the panda's concat function to create a dataframe with both the first and last values.

pd.concat([
    df['X'].drop_duplicates(keep='first'),
    df['X'].drop_duplicates(keep='last'),
])
kait
  • 1,327
  • 9
  • 13
3

you can't drop both first and last... so trick is too concat data frames of first and last.

When you concat one has to handle creating duplicate of non-duplicates. So only concat unique indexes in 2nd Dataframe. (not sure if Merge/Join would work better?)

import pandas as pd

d = {1:0,2:0,10:1, 3:0,4:0}

df = pd.DataFrame.from_dict(d, orient='index', columns=['cnt'])
print(df)

    cnt
1     0
2     0
10    1
3     0
4     0

Then do this:

d1 = df.drop_duplicates(keep=("first"))
d2 = df.drop_duplicates(keep=("last"))
d3 = pd.concat([d1,d2.loc[set(d2.index) - set(d1.index)]])
d3
Out[60]:
cnt
1   0
10  1
4   0
frankr6591
  • 1,211
  • 1
  • 8
  • 14
2

Use a groupby on your column named column, then reindex. If you ever want to check for duplicate values in more than one column, you can extend the columns you include in your groupby.

df = pd.DataFrame({'column':[0,0,0,0]})

Input:

   column
0       0
1       0
2       0
3       0

df.groupby('column', as_index=False).apply(lambda x: x if len(x)==1 else x.iloc[[0, -1]]).reset_index(level=0, drop=True)

Output:

   column
0       0
3       0
Derek O
  • 16,770
  • 4
  • 24
  • 43