I am trying to fill empty columns in one Pandas dataframe with columns from another Pandas dataframe, based on certain conditions.
First table is audit_records_df
and it looks like this:
id | audit_type | audit_date | maliciousness_level | suspicious_counts |
---|---|---|---|---|
123456 | Unknown | 2/5/21 | NaN | NaN |
123456 | Cleared | 2/6/21 | NaN | NaN |
123456 | Terminated | 2/8/21 | NaN | NaN |
345678 | Terminated | 2/5/21 | NaN | NaN |
Second table is spam_profile_most_recent_notes
:
id | audit_type | audit_date | maliciousness_level | suspicious_counts | ire_1 | ire_2 |
---|---|---|---|---|---|---|
123456 | Unknown | 2/5/21 | high | 3 | 222 | 222 |
345678 | Terminated | 2/5/21 | high | 6 | 222 | 222 |
Note that unlike audit_records_df
table (where an id
could have multiple rows), in spam_profile_most_recent_notes
table each id
will only have 1 row.
I am trying to fill columns maliciousness_level
and suspicious_counts
in audit_records_df
table with values from columns of the same name from spam_profile_most_recent_notes
table (we need to ignore columns ire_1
and ire_2
), based on the following criteria:
- For rows in
audit_records_df
table whereid
matchesid
inspam_profile_most_recent_notes
, fill themaliciousness_level
andsuspicious_counts
inaudit_records_df
table with corresponding values fromspam_profile_most_recent_notes
table in the rows whereid
matches.
After the filling, the audit_records_df
table should look like this:
id | audit_type | audit_date | maliciousness_level | suspicious_counts |
---|---|---|---|---|
123456 | Unknown | 2/5/21 | high | 3 |
123456 | Cleared | 2/6/21 | high | 3 |
123456 | Terminated | 2/8/21 | high | 3 |
345678 | Terminated | 2/5/21 | high | 6 |
I've seen some slightly similar questions like this one: Conditionally fill column with value from another DataFrame based on row match in Pandas. However, all I've seen were regarding filling in value in 1 column, unlike my use case where I have to fill in values for multiple columns.
Any advice would be greatly appreciated.