I am newbie to python and pandas. looking for help in subtraction between two different df column Sample data
df1 = pd.DataFrame({'label1': ['lab1', 'lab2', 'lab3', 'lab4'],
'compare1': [10, 20, 30, 40],
'col3': [100, 50, -30, -50]})
df2 = pd.DataFrame({'lable1': ['lab1', 'lab2', 'lab4', 'lab5'],
'compare1': [80,10, 40, 50],
'col3': [200, 60, 30, 50]})
df_diff = pd.DataFrame({'label': ['lab1', 'lab2', 'lab4', 'lab3', 'lab5'],
'df1_compare1': [10, 20, 40, 30, 0],
'df2_compare1': [80,10, 40, 0, 50],
'compare': [-70, 10, 0, 30, 50]})
Column name and count are same for both dfs.
Need help in subtraction (or other maths operation) between df1.compare1 and df2.compare2 if label1 and label2 has the same value.
for example, lab1, lab2 and lab 4(though lab 4 on different row) are having same label value. any rows which are not same in both df should show up at the bottom of the df with original value while other column data shows as 0.
Actual data are in CSV/excel MB in size, if possible looking for performance effective method
Does the non-null object has impact on performance?? This is the column type on label.