I have 2 dataframes structured in the same way as follows:
df1 = pd.read_csv("Main_Database.csv")
# df1 Columns: ..., Timestamp, Name, Query, Website, Status,...
df2 = pd.read_csv("New_Raw_Results.csv")
# df2 COlumns: ..., Timestamp, Name, Query, Website, Status,...
Both dataframes can have exactly the same columns.
My Main_database.csv
keeps track of all records, my new_raw_results
is a list of new results that come in every week. I would like to process changes in my main_database
based on 3 scenarios:
A) IF Query AND Website in DF2 found in DF1,
--> write in DF1 column "Last Seen", using Timestamp from Df2
--> Overwrite Status to "STILL ACTIVE"
B) IF Query AND Website in DF2 not found in DF1,
--> append entire df2.row to df1
--> Overwrite Status to "NET NEW"
C) IF Query AND Website in DF1 not found in DF2,
--> Overwrite Status to "EXPIRED"
I've tried using a combination of merges and joins, but I'm stuck here. For example, if I isolate in a new dataframe the result of an inner join between these 2 tables, I'm not sure how to use it to take action on my main database. I'm trying to fit all these conditions under one function, so I can use this function to process new entries.
How would you structure this function? What would be the most concise way to approach this problem?