-1

I have two different dataframes and need to use python in order to show changes in Quantity, Price, Local Value, Local Exposure

The two data sets

what type of syntax do i need to create a third dataframe that purely shows the differences in values for each row ?

I have seen things online about merging the dataframes together first, but im new to this so im unsure

  • 1
    Does this answer your question? [Find difference between two data frames](https://stackoverflow.com/questions/48647534/find-difference-between-two-data-frames) – Alex W Apr 04 '23 at 14:17

3 Answers3

0

I'm not sure to understand what kind of input you have and result you want but you could subtract your two dataframe.

import pandas as pd

# Creating two sample dataframes
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 3, 3], 'B': [4, 5, 7]})

# Subtracting one dataframe from the other to get the differences
diff_df = df2 - df1

# Displaying the differences dataframe
print(diff_df)
# Output
   A  B
0  0  0
1  1  0
2  0  1
Basile Lamarque
  • 161
  • 1
  • 7
0

To use df subtraction, the df columns must be numbers.

import pandas as pd

columns = ['Instrument','Position','Quantity', 'Value']
numerics = ['Quantity', 'Value']

data = [['a','cash',1000, 12],
        ['b','cash',2000, 15]]
df1 = pd.DataFrame(data, columns=columns)
print(df1)

data = [['a','cash',1500, 101],
        ['b','cash',4500, 101]]
df2 = pd.DataFrame(data, columns=columns)
print(df2)

# Compute the difference of the numeric columns
diffs = df2[numerics] - df1[numerics]
print(diffs)

# Plug-in the diffs back into a copy of df1
dfout = df1.copy()
dfout[numerics] = diffs[numerics]
print(dfout)

Output:

  Instrument Position  Quantity  Value
0          a     cash      1000     12
1          b     cash      2000     15
  Instrument Position  Quantity  Value
0          a     cash      1500    101
1          b     cash      4500    101
   Quantity  Value
0       500     89
1      2500     86
  Instrument Position  Quantity  Value
0          a     cash       500     89
1          b     cash      2500     86
C. Pappy
  • 739
  • 4
  • 13
0

To create a third dataframe that shows the differences in values between the two original dataframes, you can follow these steps:

Merge the two dataframes together using a common key column. If the two dataframes have a column with the same name and data type, you can use that column as the key column. For example:

merged_df = pd.merge(df1, df2, on='Product Code')

This will create a new dataframe merged_df that contains all the columns from both df1 and df2.

Calculate the differences between the columns that you are interested in. For example, if you want to calculate the difference between the "Quantity" column in df1 and df2, you can subtract the "Quantity" column in df2 from the "Quantity" column in df1.

merged_df['Quantity Diff'] = merged_df['Quantity_x'] - merged_df['Quantity_y']

This will create a new column "Quantity Diff" in merged_df that contains the differences between the "Quantity" columns in df1 and df2.

Repeat step 2 for the other columns that you are interested in. For example:

merged_df['Price Diff'] = merged_df['Price_x'] - merged_df['Price_y']
merged_df['Local Value Diff'] = merged_df['Local Value_x'] - merged_df['Local Value_y']
merged_df['Local Exposure Diff'] = merged_df['Local Exposure_x'] - merged_df['Local Exposure_y']

This will create new columns in merged_df that contain the differences between the corresponding columns in df1 and df2.

Create a new dataframe that contains only the columns that you are interested in. For example:

diff_df = merged_df[['Product Code', 'Quantity Diff', 'Price Diff', 'Local Value Diff', 'Local Exposure Diff']]

This will create a new dataframe diff_df that contains only the columns "Product Code", "Quantity Diff", "Price Diff", "Local Value Diff", and "Local Exposure Diff".

Optionally, you can sort the dataframe by the "Product Code" column to make it easier to read. For example:

diff_df = diff_df.sort_values('Product Code')

This will sort the rows in diff_df by the "Product Code" column in ascending order.