I have a table in pandas:
import pandas as pd
df = pd.DataFrame({
'LeafID':[1,1,2,1,3,3,1,6,3,5,1],
'pidx':[10,10,300,10,30,40,20,10,30,45,20],
'pidy':[20,20,400,20,15,20,12,43,54,112,23],
'count':[10,20,30,40,80,10,20,50,30,10,70],
'score':[10,10,10,22,22,3,4,5,9,0,1]
})
LeafID count pidx pidy score
0 1 10 10 20 10
1 1 20 10 20 10
2 2 30 300 400 10
3 1 40 10 20 22
4 3 80 30 15 22
5 3 10 40 20 3
6 1 20 20 12 4
7 6 50 10 43 5
8 3 30 20 54 9
9 5 10 45 112 0
10 1 70 20 23 1
I want to do a groupby
and then filter the rows where occurrence of pidx
is greater than 2.
That is, filter rows where pidx
is 10 and 20.
I tried using df.groupby('pidx').count()
but it didn't helped me. Also for those rows I have to do 0.4*count+0.6*score.
Desired output is:
LeafID count pidx pidy final_score
1 10 10 20
1 20 10 20
1 40 10 20
6 50 10 43
1 20 20 12
3 30 20 54
1 70 20 23