-1

I’ve 2 data frames which have one common column ‘X’ having all the unique values. I want to subtract each column of data frame 1 to that of data frame 2 which have similar names. dataframe 1

X bar A bar B
A1 48.0 1235
A2 53.0 4567

dataframe 2

X par A par B
A2 74.0 8342
A1 63.0 6531

Desired result:

X bar A par A Difference Status
A1 48.0 63.0 15.0 Ok
A2 53.0 74.0 21.0 Ok

To be noted that Difference here refers to barA - parA I.e subtraction. and this kind of data frames are required for each column There are 20 columns in my data frames. The status column gets not ok when value is less Than 0.

I’m thinking of writing a for loop to perform this but unable to start. If anyone can help in this. Ps I’m working on python.

  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – BigBen Feb 07 '22 at 16:31
  • @BigBen. It's a partial answer. – Corralien Feb 07 '22 at 16:31
  • How are your columns named? There is a common prefix or suffix to match columns pairwise? – Corralien Feb 07 '22 at 16:33
  • @Corralien first, this is too broad of a question. Second, clearly OP doesn't know about `merge`. Third, there's not enough detail to propose a solution, which is why you (rightly) asked for more. – BigBen Feb 07 '22 at 16:34
  • @Corralien Yes columns has common suffix in case of both dataframes for eg I need to subtract col1 of df1 with col1 of df2 then both the columns have a common suffix. – user18143261 Feb 07 '22 at 16:46
  • Did you read the linked thread on how to use `merge`? That's a *lot* of help, not just a bit :-) – BigBen Feb 07 '22 at 16:47

2 Answers2

0

Yes you can merge on column "X" to create a single dataframe with all values, then iterate through the columns and append calculated dataframes to a list.

Something like:

df = pd.merge(left=df1, right=df2, on='X', how='inner')
output_frames = []
letters = [i[-1] for i in df.columns if not i == 'X'] #Get letters in your columns 
letters = list(set(letters)) #Get list of letters with single element
for letter in letters:
    newdf = df.loc[:, ['X'] + [i for i in df.columns if i.endswith(letter)]]
    newdf['Difference'] = newdf[f'par {letter}'] - newdf[f'bar {letter}']
    newdf['Status'] = newdf.apply(lambda x: 'OK' if x.Difference >= 0 else 'NOT OK', axis=1)
    output_frames.append(newdf)

Then view output:

for df in newdf:
    print(df)

Or better still, merge the outputs:

df = pd.concat(output_frames, axis=1)  
AlecZ
  • 546
  • 5
  • 9
0

Your expected output and logic is a bit ambiguous, I'll assume here that you want to subtract the column of df2 that has the same trailing letter as the column in df1. Then compute the status if all differences are positive:

df3 = df1.merge(df2, on='X').set_index('X')

(df3.groupby(df3.columns.map(lambda x: x.split()[1]), axis=1)
    .apply(lambda d: d.diff(axis=1).iloc[:,-1])
    .add_prefix('diff_')
    .assign(Status=lambda d: np.where(d.ge(0).all(1), 'OK', 'Not OK'))
)

output:

     diff_A  diff_B Status
X                         
A1     15.0  5296.0     OK
A2     21.0  3775.0     OK
mozway
  • 194,879
  • 13
  • 39
  • 75