Fill multiple columns with value from another dataframe

Question

I am trying to fill empty columns in one Pandas dataframe with columns from another Pandas dataframe, based on certain conditions.

First table is audit_records_df and it looks like this:

id	audit_type	audit_date	maliciousness_level	suspicious_counts
123456	Unknown	2/5/21	NaN	NaN
123456	Cleared	2/6/21	NaN	NaN
123456	Terminated	2/8/21	NaN	NaN
345678	Terminated	2/5/21	NaN	NaN

Second table is spam_profile_most_recent_notes:

id	audit_type	audit_date	maliciousness_level	suspicious_counts	ire_1	ire_2
123456	Unknown	2/5/21	high	3	222	222
345678	Terminated	2/5/21	high	6	222	222

Note that unlike audit_records_df table (where an id could have multiple rows), in spam_profile_most_recent_notes table each id will only have 1 row.

I am trying to fill columns maliciousness_level and suspicious_counts in audit_records_df table with values from columns of the same name from spam_profile_most_recent_notes table (we need to ignore columns ire_1 and ire_2), based on the following criteria:

For rows in audit_records_df table where id matches id in spam_profile_most_recent_notes, fill the maliciousness_level and suspicious_counts in audit_records_df table with corresponding values from spam_profile_most_recent_notes table in the rows where id matches.

After the filling, the audit_records_df table should look like this:

id	audit_type	audit_date	maliciousness_level	suspicious_counts
123456	Unknown	2/5/21	high	3
123456	Cleared	2/6/21	high	3
123456	Terminated	2/8/21	high	3
345678	Terminated	2/5/21	high	6

I've seen some slightly similar questions like this one: Conditionally fill column with value from another DataFrame based on row match in Pandas. However, all I've seen were regarding filling in value in 1 column, unlike my use case where I have to fill in values for multiple columns.

Any advice would be greatly appreciated.

have u checked https://stackoverflow.com/questions/49062176/fill-nan-values-from-another-dataframe-with-different-shape — Epsi95, Feb 11 '21 at 03:50

Safuan Mazlan · Accepted Answer · 2021-02-11T04:14:48.777

I think you can solve this by:

Drop the malicious_level and suspicious_counts columns from the audit_record_df table.
Keep only id, malicious_level and suspicious_counts columns from the spam_profile_most_recent_notes
Perform a left inner join on both tables using id

audit_record_df.drop(['maliciousness_level', 'suspicious_counts'], axis=1, inplace=True)
print(audit_record_df)

       id  audit_type audit_date
0  123456     Unknown     2/5/21
1  123456     Cleared     2/6/21
2  123456  Terminated     2/8/21
3  345678  Terminated     2/5/21

spam_profile_most_recent_notes = spam_profile_most_recent_notes[['id', 'maliciousness_level', 'suspicious_counts']]
print(spam_profile_most_recent_notes)

       id  audit_type audit_date
0  123456     Unknown     2/5/21
1  123456     Cleared     2/6/21
2  123456  Terminated     2/8/21
3  345678  Terminated     2/5/21

new_df = audit_record_df.merge(spam_profile_most_recent_notes, on='id', how='left')
print(new_df)

       id  audit_type audit_date maliciousness_level  suspicious_counts
0  123456     Unknown     2/5/21                high                  3
1  123456     Cleared     2/6/21                high                  3
2  123456  Terminated     2/8/21                high                  3
3  345678  Terminated     2/5/21                high                  6

Thanks @Safuan. That is one smart approach that I should've thought about, essentially adjusting the columns to make merging work. — Stanleyrr, Feb 11 '21 at 04:21

Fill multiple columns with value from another dataframe

1 Answers1