I am looking to compute the mode over a dataframe that I previously filter with a mask. To explain the problem, below a sample of how the data look like:
ID,MASK,VALUE
1,[2,3],4
2,[4,1],2
3,[],2
4,[2],3
The result that I would like to obtain is the following:
ID,MASK,VALUE,VALUE_M
1,[2,3],4,2
2,[4,1],2,3
3,[],2,-1
4,[2],3,2
When the mode cannot be determined I would like to have the lowest number. When no MASK
is defined, the value will be -1
.
The code that I am using now is the following:
for index,row in df.iterrows():
mask= row['MASK']
if len(mask)>0:
df.loc[index,'VALUE_M'] = df.loc[df['ID'].isin(MASK),'VALUE'].value_counts().index[0]
else:
df.loc[index,'VALUE_M'] = -1
As you can see I am cycling over each row, which is highly unrecommended when using pandas, especially when there are a lot of rows (which is my case). I am looking for a more optimized way to obtain the result.
Any idea?