1

I face problem in pandas where I perform many changes on data. But eventually I dont know which change caused the final state of value in the column.

For example I change volumes like this. But I run many checks like this one:

# Last check 
for i in range(5):
    df_gp.tail(1).loc[ (df_gp['volume']<df_gp['volume'].shift(1)) | (df_gp['volume']<0.4),['new_volume']  ] = df_gp['new_volume']*1.1

I want to update not only 'new_volume' column, but also column 'commentary' if the conditions are fulfilled.

Is it possible to add it somewhere, so that I 'commentary' is updated in the same time as 'new_volume'?

Thanks!

HeadOverFeet
  • 768
  • 6
  • 13
  • 33
  • 1
    Perhaps you can add some data that can make you exmaple reproducible? https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Evgeny Jun 13 '18 at 11:27

1 Answers1

2

Yes, it is possible by assign, but in my opinion less readable, better is update each columns separately by boolean mask cached in variable:

df_gp = pd.DataFrame({'volume':[.1,.3,.5,.7,.1,.7],
                     'new_volume':[5,3,6,9,2,4],
                     'commentary':list('aaabbb')})

print (df_gp)
   volume  new_volume commentary
0     0.1           5          a
1     0.3           3          a
2     0.5           6          a
3     0.7           9          b
4     0.1           2          b
5     0.7           4          b

#create boolean mask and assign to variable for reuse
m = (df_gp['volume']<df_gp['volume'].shift(1)) | (df_gp['volume']<0.4)

#change columns by assign by condition and assign back only filtered columns 
c = ['commentary','new_volume']
df_gp.loc[m, c] = df_gp.loc[m, c].assign(new_volume=df_gp['new_volume']*1.1
                                         commentary='updated')
print (df_gp)
   volume  new_volume commentary
0     0.1         5.5    updated
1     0.3         3.3    updated
2     0.5         6.0          a
3     0.7         9.0          b
4     0.1         2.2    updated
5     0.7         4.0          b

#multiple filtered column by scalar
df_gp.loc[m, 'new_volume'] *= 1.1
#append new value to filtered column
df_gp.loc[m, 'commentary'] = 'updated'
print (df_gp)
   volume  new_volume commentary
0     0.1         5.5    updated
1     0.3         3.3    updated
2     0.5         6.0          a
3     0.7         9.0          b
4     0.1         2.2    updated
5     0.7         4.0          b
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    awesome! great! please, could you explain me the first syntax a bit? thanks! – HeadOverFeet Jun 13 '18 at 13:08
  • @HeadOverFeet - Sure, give em a sec. – jezrael Jun 13 '18 at 13:08
  • 1
    btw, when using assign you don't need the lambda, you can also use `df_gp.loc[m, c] = df_gp.loc[m, c].assign(new_volume=df_gp['new_volume']*1.1, commentary='updated')` this makes is a bit more readable. But I also think that using assign is not the best option here – Quickbeam2k1 Jun 14 '18 at 07:25
  • @Quickbeam2k1 - Yes, I think same, better is each column set separately, I try explain it in first sentence in answer, maybe it should be more improved, but no idea how :) – jezrael Jun 14 '18 at 07:30