0

I am trying to replace every instance of 2 (except for the first instance of 2), with the value 0. This is the code I tried, which results in an error message. Best solution I could think of was .where() but also could see maybe a duplicates() code that keeps='first'. Note I want to call all columns without specifying each individual column since the dataframe is much bigger. If you look at the first column, where it shows 2020-08 at the bottom, I would like for that to be a '0'.

original output:

pd.DataFrame({'year_month':    [2018-02, 2018-03, 2018-04, 2018-05, 2018-06, 2018-07],
             'adoption_1':     [0, 0, 1, 1, 1, 2, 2],
             'adoption_2':     [0, 0, 0, 1, 2, 2, 2],
             'adoption_3':     [0, 1, 1, 1, 1, 2, 2})

df.set_index('year_month')

desired output:

pd.DataFrame({'year_month':    [2018-02, 2018-03, 2018-04, 2018-05, 2018-06, 2018-07],
             'adoption_1':     [0, 0, 1, 1, 1, 2, 0],
             'adoption_2':     [0, 0, 0, 1, 2, 0, 0],
             'adoption_3':     [0, 1, 1, 1, 1, 2, 0})

df.set_index('year_month')

df[df.where((df.shift(2) == 1) & (df.shift(1) == 2))] = 0
Deke Marquardt
  • 111
  • 1
  • 9
  • If you need assistance formatting a small sample of your DataFrame as a copyable piece of code for SO see [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker Oct 28 '21 at 22:08
  • what abour `df.column.apply(lambda x: transform_(x) if condition_is_matched(x) else x )` – L F Oct 28 '21 at 22:19
  • Sorry is it possible if you could maybe reenact/imitate the task little more than just condition_is_matched(x)? Thank you! – Deke Marquardt Oct 28 '21 at 22:39

1 Answers1

0

this is not very good solution and probably there is an easy way to do that. but it's a solution :)

df = pd.DataFrame({'year_month':    [2018, 2018, 2018, 2018, 2018, 2018, 2018],
             'adoption_1':     [0, 0, 1, 1, 1, 2, 2],
             'adoption_2':     [0, 0, 0, 1, 2, 2, 2],
             'adoption_3':     [0, 1, 1, 1, 1, 2, 2]})
    
for i in df.columns:
    first=False
    for index, row in df.iterrows():
        if row[i]==2 and first==False:
            first=True
        elif row[i]==2 and first==True:
            row[i]=0