I have two dataframes with identical columns but different number of rows and differences in some row cells. RIGHT is an update of LEFT. The only thing they have in common is that the values in the LEFT column of A, may also occur in RIGHT A unchanged (as a whole identical row with them), or with differences in the row in which they occur, or the rows with them may be missing from RIGHT, or RIGHT will have brand new rows with new values of A. So A is a key that may or may not occur in both tables.
left = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["K0", "K1", "K0", "K1"],
}
)
right = pd.DataFrame(
{
"A": ["A0", "A42", "A2", "A3"],
"B": ["B0", "B1", "B2", "B333"],
"C": ["K222", "K0", "K0", "K1"],
}
)
I have tried something like that (There is more than 3 columns in dataframes, but only changes in A,B,C matter):
data = pd.merge(left[['A', 'B', 'C']],
right [['A', 'B', 'C']],
indicator=True,
how='right')
and the result is almost ok - I've got all differences in rows, but I Would like also to receive new rows and which rows of left, are missing in right.
I would be very grateful if you could suggest a solution and at the same time help me to create a new column in which there will be information whether the change is a change in row or a new column or a deleted column.