0

I have a pandas data-frame:

         df
                  Case  Type  Incr_1_3  Incr_2_3 Incr_3_3  Saving_1_3 Saving_2_3 Saving_3_3
              0    HI    PC1   0.046      2.388   0.388       0.254     2.539      2.539                   
              1    HI    PC2   0.000      0.024   0.024       0.026     0.054      0.054                
              2    LO    PC1   0.000      0.024   0.024       0.026     0.054      0.054
              3    LO    PC2   0.030      0.054   0.074       0.126     0.754      0.036
              4    RF    PC1   0.030      0.054   0.074       0.126     0.754      0.036
              4    RF    PC2   0.030      0.054   0.074       0.126     0.754      0.036

I want to calculate the difference between "Saving" and "Incr" columns. The code for each new column looks like this:

                df['D_1_3']=df['Saving_1_3']-df['Incr_1_3']
                df['D_2_3']=df['Saving_2_3']-df['Incr_2_3']
                df['D_2_3']=df['Saving_3_3']-df['Incr_3_3']

I can do it manually, however, I can't figure out a more efficient way to perform the calculation since my actually dataframe has more columns.

I tried following method:

          saving_list=['Saving_1_3','Saving_2_3','Saving_3_3']
          cost_list=['Incr_1_3','Incr_2_3','Incr_3_3']

          df[['A','B','C']]=df[saving_list]-df[cost_list]

It gives me the following error message:

          "ValueError: Columns must be same length as key"

Thank you for your help!

Spring
  • 193
  • 11

2 Answers2

1

This may help you.

result = df[saving_list].values - df[cost_list].values
df['A'], df['B'], df['C'] = result[:, 0], result[:, 1], result[:, 2]

Or this:

df = pd.concat([df, pd.DataFrame(df[saving_list].values - df[cost_list].values, columns=['A', 'B', 'C'])], axis=1)
talatccan
  • 743
  • 5
  • 19
1

Use DataFrame.values to subtract between numpy matrix

inc_cols=df.columns[df.columns.str.contains('Inc')]
saving_cols=df.columns[df.columns.str.contains('Saving')]
new_cols=inc_cols.str.replace('Incr','D')
new_df = df.assign(**dict(zip(new_cols, 
                              (df[saving_cols].values-df[inc_cols].values).T)))
print(new_df)

  Case Type  Incr_1_3  Incr_2_3  Incr_3_3  Saving_1_3  Saving_2_3  Saving_3_3  \
0   HI  PC1     0.046     2.388     0.388       0.254       2.539       2.539   
1   HI  PC2     0.000     0.024     0.024       0.026       0.054       0.054   
2   LO  PC1     0.000     0.024     0.024       0.026       0.054       0.054   
3   LO  PC2     0.030     0.054     0.074       0.126       0.754       0.036   
4   RF  PC1     0.030     0.054     0.074       0.126       0.754       0.036   
4   RF  PC2     0.030     0.054     0.074       0.126       0.754       0.036   

   D_1_3  D_2_3  D_3_3  
0  0.208  0.151  2.151  
1  0.026  0.030  0.030  
2  0.026  0.030  0.030  
3  0.096  0.700 -0.038  
4  0.096  0.700 -0.038  
4  0.096  0.700 -0.038  

You can look at this to undestand **

ansev
  • 30,322
  • 5
  • 17
  • 31