0

I have two dataframes, df1 & df2. They have the same column names and usually the same column values. I need to isolate the rows where the value in the column was changed, and add a column to signify what the previous value was.

In the example below, the number of actions for Nancy changes between df1 and df2. I want to create a new dataframe with just that row, even though a new row for Mary was added to df2.

df1:

Name Action Number of Actions
Stacy Action1 32
Nancy Action2 67
Emily Action3 89
Abby Action2 9

df2:

Name Action Number of Actions
Stacy Action1 32
Nancy Action2 75
Emily Action3 89
Abby Action2 9
Mary Action1 43

Expected Output (as a dataframe):

Name Action Number of Actions Previous Value
Nancy Action2 75 67
melnikas
  • 3
  • 2
  • 2
    Welcome to [Stack Overflow.](https://stackoverflow.com/ "Stack Overflow") This is not a code-writing or tutoring service. We can help solve specific, technical problems, not open-ended requests for code or advice. Please edit your question to show what you have tried so far, and what specific problem you need help with. See the [How To Ask a Good Question](https://stackoverflow.com/help/how-to-ask "How To Ask a Good Question") page for details on how to best help us help you. – itprorh66 May 18 '22 at 18:17

1 Answers1

1

You can try pandas.DataFrame.merge

df = (df2.merge(df1, on=['Name', 'Action'], how='left')
      .rename(columns={'Number of Actions_x': 'Number of Actions', 'Number of Actions_y': 'Previous Value'})
      .loc[lambda df: df['Number of Actions'].ne(df['Previous Value'])]
      .dropna()
      )
print(df)

    Name   Action  Number of Actions  Previous Value
1  Nancy  Action2                 75              67
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • 1
    Nice! For the selection step, you could also do ``.query('`Number of Actions` != `Previous Value`')`` instead. – wjandrea May 18 '22 at 18:38