I have a dataframe as follows :
df = pd.DataFrame({"user_id": ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b'],
"value": [20, 17,15, 10, 8 , 18, 18, 17, 13, 10]})
Notice that the dataframe is sorted in descending order by user_id then value.
For each user_id, I would like to remove the 2nd and 4th row so the output would look like
df = pd.DataFrame({"user_id": ['a', 'a', 'a', 'b', 'b', 'b',],
"value": [20, 15, 8 , 18, 17, 10]})
Inspired by drop first and last row from within each group, I tried the following :
def drop_rows(dataframe) :
pos = [1,3]
return dataframe.drop(dataframe.index[pos], inplace=True)
df.groupby('user_id').apply(drop_rows)
But got this "index 2 is out of bounds for axis 0 with size 0"
Could someone explain why this doesn't work and how I should proceed instead ? Also, given that the dataset is quite huge, an efficient approach to the solution would be helpful. Thanks a lot.