1

This isn't a duplicate. I am not trying drop rows based on Index

I have a dataframe like as shown below

df = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,2,2,2,2,2],
'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-05 
12:59:00','2173-05-04 13:14:00','2173-05-05 13:37:00','2173-07-06 
13:39:00','2173-07-08 11:30:00','2173-04-08 16:00:00','2173-04-09 
22:00:00','2173-04-11 04:00:00','2173- 04-13 04:30:00','2173-04-14 
08:00:00'],
'val' :[5,2,3,1,1,6,5,5,8,3,4,6]})
df['time_1'] = pd.to_datetime(df['time_1'])
df['day'] = df['time_1'].dt.day

enter image description here

I would like to drop records based on subject_id if their count is <=5.

This is what I tried

df1 = df.groupby(['subject_id']).size().reset_index(name='counter')
df1[df1['counter']>5] # this gives the valid subject_id = 1 has count more than 5)

Now using this subject_id, I have to get the base dataframe rows for that subject_id

There might be an elegant way to do this.

I would like to get the output as shown below. I would like have my base dataframe rows

enter image description here

The Great
  • 7,215
  • 7
  • 40
  • 128
  • Possible duplicate of [How to drop a list of rows from Pandas dataframe?](https://stackoverflow.com/questions/14661701/how-to-drop-a-list-of-rows-from-pandas-dataframe) – Kostas Charitidis Oct 08 '19 at 09:51
  • Possible duplicate of [Select Pandas rows based on list index](https://stackoverflow.com/questions/19155718/select-pandas-rows-based-on-list-index) – PV8 Oct 08 '19 at 09:53
  • No, it may not work using `index` and it's not a duplicate. I have a condition and when I apply that condition, the indices get changed – The Great Oct 08 '19 at 09:58

1 Answers1

3

Use:

df[df.groupby('subject_id')['subject_id'].transform('size')>5]

Output:

   subject_id              time_1  val  day
0           1 2173-04-03 12:35:00    5    3
1           1 2173-04-03 12:50:00    2    3
2           1 2173-04-05 12:59:00    3    5
3           1 2173-05-04 13:14:00    1    4
4           1 2173-05-05 13:37:00    1    5
5           1 2173-07-06 13:39:00    6    6
6           1 2173-07-08 11:30:00    5    8
ansev
  • 30,322
  • 5
  • 17
  • 31