Suppose I have the folowing pandas dataframe:
df = pd.DataFrame({"#":['a','a','a','a','b','b','b','c','c','d','d'],
"$":[8,5,7,20,4,6,10,8,8,9,9],
"C":[0,1,2,3,1,0,2,1,0,1,0]})
# $ C
0 a 8 0
1 a 5 1
2 a 7 2
3 a 20 3
4 b 4 1
5 b 6 0
6 b 10 2
7 c 8 1
8 c 8 0
9 d 9 1
10 d 9 0
I am looking to drop the rows where values from '#' only have a 0 or 1 correspondent in column C, in this case, delete the row with 0 and keep the one with 1. So in my case, only values c and d from column # have only 0 and 1 in column C. Therefore I would have to delete the rows where c and d have 0 in c and keep the 1. I cannot do the same for a and b because they mapped to other values than 0 and 1 in C. It has to be 0 and 1 only.
I have accidentaly stumbled over this working solution by trying out multiple things.
df.groupby(['#','$'], as_index=False)['C'].agg({'C':'max'})
This option however,I cannot use because in my real dataset I have multiple columns that contain null values and grouping will not work in my case. Therefore, I am looking for a different solution.
The solution dataframe will look like this:
# $ C
0 a 5 1
1 a 7 2
2 a 8 0
3 a 20 3
4 b 4 1
5 b 6 0
6 b 10 2
7 c 8 1
8 d 9 1