0

I have a dataframe df with a 'Date' value, a 'Time' value and a 'X' value. I would like to delete all the days with a count of observation smaller than 388.

I tried to use the following

aux = df.groupby('Date')['X'].count()
for i in aux.index:
    idx = df['Date']==i
    if sum(idx)<388:
        df = df[~(idx)]

But it is super slow. Is there a faster way to do it?

  • 3
    Might be worth giving: `df2 = df[df.groupby('Date')['X'].transform('count') >= 388]` a try? – Jon Clements Oct 24 '20 at 12:11
  • @JonClements not an exact dupe but should suffice - can you close ? – Umar.H Oct 24 '20 at 12:36
  • Does this answer your question? [Filtering pandas dataframe with multiple Boolean columns](https://stackoverflow.com/questions/46207530/filtering-pandas-dataframe-with-multiple-boolean-columns) – Umar.H Oct 24 '20 at 12:37
  • @Manakin why Jon's solution is not exact? Moreover, the link you provided do not answer my question since the indexes of df and aux are different – Ninja Warrior Oct 24 '20 at 13:26
  • `aux` is just a subset of your dataframe which you obtain via looping. Since you need to subset your dataframe by a boolean I think the answer suffices. – Umar.H Oct 24 '20 at 15:03

0 Answers0