Given
pd.DataFrame({'feature': [0.5,0.1,0.3,0.2,0.6,0.4,0.3], 'label': [0,1,2,2,1,2,0]})
I would like to apply the following rule: for all rows with feature
greater than 0.2, there is a 60% chance that their label changes to 2. Otherwise it will remain unchanged.
My solution was:
df.loc[df.feature > 0.2, 'label'] = [
np.random.choice(x, p=(0.6,0.4)) for x in zip(np.full(len(df.feature > 0.2), fill_value=2), df.loc[df.feature > 0.2, 'label'])]
Is there a simpler, vectorised way to do this?