Generating data
random.seed(42)
date_rng = pd.date_range(start='1/1/2018', end='1/08/2018', freq='H')
df = pd.DataFrame(np.random.randint(0,10,size=(len(date_rng), 3)),
columns=['data1', 'data2', 'data3'],
index= date_rng)
mask = np.random.choice([1, 0], df.shape, p=[.35, .65]).astype(bool)
df[mask] = np.nan
I want to do the following operation: calculate the 5% quantile of each column, then compare the value of each cell in that column with the calculated quantile: if they are smaller, set them to the 5% quantile of the column.
I have read those questions
Pandas DataFrame: replace all values in a column, based on condition
Replacing values greater than a number in pandas dataframe
and come up with my solution:
df[df < df.quantile(q=0.05, axis=0)] = df.quantile(q=0.05, axis=0)
but it's not working, because I'm trying to replace each value with a series. How can I solve this problem? Thank you