I have following pandas df :
import pandas as pd
import numpy as np
pd_df = pd.DataFrame({'Qu1': ['apple', 'potato', 'cheese', 'banana', 'cheese', 'banana', 'cheese', 'potato', 'egg'],
'Qu2': ['sausage', 'banana', 'apple', 'apple', 'apple', np.nan, 'banana', 'banana', 'banana'],
'Qu3': ['apple', 'potato', 'sausage', 'cheese', 'cheese', 'potato', 'cheese', 'potato', 'egg']})
I'd like to implement where()
on two columns only Qu1
and Qu2
and keep the rest
original stackoverflow question
, so I created pd1
pd1 = pd_df.where(pd_df.apply(lambda x: x.map(x.value_counts()))>=2,
"other")[['Qu1', 'Qu2']]
Then I added a rest of pd_df
,pd_df['Qu3']
to pd1
pd1['Qu3'] = pd_df['Qu3']
pd_df = []
My question is : Originally I want to execute where()
on part of df
and keep rest of columns as is, so could the code above be dangerous for large dataset ? Can I harm the original data this way ? If yes what the best way to do it ?
Thanks a lot !