Dynamically differencing columns in a pandas dataframe using similar column names

Question

The following is the first couple of columns of a data frame, and I calculate V1_x - V1_y, V2_x - V2_y, V3_x - V3_y etc. The difference variable names differ only by the last character (either x or y)

import pandas as pd

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Address': ['xx', 'yy', 'zz','ww'], 'V1_x': [20, 21, 19, 18], 'V2_x': [233, 142, 643, 254], 'V3_x': [343, 543, 254, 543], 'V1_y': [20, 21, 19, 18], 'V2_y': [233, 142, 643, 254], 'V3_y': [343, 543, 254, 543]}

df = pd.DataFrame(data)

df
     Name Address  V1_x  V2_x  V3_x  V1_y  V2_y  V3_y
0     Tom      xx    20   233   343    20   233   343
1  Joseph      yy    21   142   543    21   142   543
2   Krish      zz    19   643   254    19   643   254
3    John      ww    18   254   543    18   254   543

I currently do the calculation by manually defining the column names:

new_df = pd.DataFrame()
new_df['Name'] = df['Name']
new_df['Address'] = df['Address']
new_df['Col1'] = df['V1_x']-df['V1_y']
new_df['Col1'] = df['V2_x']-df['V2_y']
new_df['Col1'] = df['V3_x']-df['V3_y']

Is there an approach that I can use to check if the last column names only differ by the last character and difference them if so?

Does this answer your question? [Generating column names with string and number](https://stackoverflow.com/questions/41503845/generating-column-names-with-string-and-number) — Michael Delgado, Mar 02 '22 at 03:22

Scott Boston · Accepted Answer · 2022-03-02T12:14:02.693

Try creating a multiindex header using .str.split then reshape the dataframe and using pd.DataFrame.eval for calcuation then reshape back to original form with additional columns. Lastly flatten the multiindex header using list comprehension with f-string formatting:

dfi = df.set_index(['Name', 'Address'])

dfi.columns = dfi.columns.str.split('_', expand=True)

dfs = dfi.stack(0).eval('diff=x-y').unstack()
dfs.columns = [f'{j}_{i}' for i, j in dfs.columns]

dfs

Output:

                V1_x  V2_x  V3_x  V1_y  V2_y  V3_y  V1_diff  V2_diff  V3_diff
Name   Address                                                               
John   ww         18   254   543    18   254   543        0        0        0
Joseph yy         21   142   543    21   142   543        0        0        0
Krish  zz         19   643   254    19   643   254        0        0        0
Tom    xx         20   233   343    20   233   343        0        0        0

Dynamically differencing columns in a pandas dataframe using similar column names

1 Answers1