Get rid of repeated rows in pandas

Question

I have some csv data from a counting experiment, in which I am given a measurement time and the number of counts between that time and the previous measurement time. For some reason, whenever I have counts (sometimes I have none), that row gets repeated the same number of times as the number of counts. Here is a basic example:

time counts
t1 0
t2 1
t3 0
t4 3
t4 3
t4 3
t5 0

So t4 gets repeated 3 times, because I have 3 counts associated to it and this happens for any number of counts (except for zero, in which case the row appears just once). there are more columns in my case but it is just these 2 that matters. Is there a fast way to remove these redundant rows and have each count appearing only one time i.e.:

time counts
t1 0
t2 1
t3 0
t4 3
t5 0

Thank you!

score 0 · Answer 1 · answered Nov 11 '19 at 04:22

Use drop duplicates:

import numpy as np
import pandas as pd

df = pd.DataFrame({'time': ['t1', 't2', 't3', 't4', 't4', 't4', 't5'],
          'counts': [0, 1, 0, 3, 3, 3, 0]})

print(df)

print(df.drop_duplicates())
time  counts
0   t1       0
1   t2       1
2   t3       0
3   t4       3
6   t5       0

score 0 · Answer 2 · answered Nov 11 '19 at 05:11

Deleting duplicates can be done in following way only taking a certain column .

df =  df.drop_duplicates('Column',keep='first')

This will delete duplicates and only keep the first value .

df =  df.drop_duplicates('time',keep='first')

You can also arrange you data in ascending or descending to get more accurate.

df = df.sort_values(by=['Counts'], ascending=[False],na_position='last')
df = df.drop_duplicates('time',keep='first')

Get rid of repeated rows in pandas

2 Answers2