Create a third column with column values of dataframe A by comparing two columns of dataframe A and B

Question

I have a dataframe A which contains mapping of "reason_name" column to Yes, no, unfinished via "Yes/No/Unfinished" column.

I need to compare the "reason_name" column of dataframe A with that of dataframe B and then create a third column "donation_status" in dataframe B which will contain the corresponding row value of "Yes/No/Unfinished" column based on comparison

I want something like the screenshot below but make the rows sorted on ascending order of date and then ascending order of time:

I am trying the following code:

aggregated_call_logs_df["donation_status"] = aggregated_call_logs_df.reason_name.map(mapping_reason_name_donation_status_df)

However, I get the following error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

NOTE: I filled the nan values to empty string.

Rabinzel · Answer 1 · 2022-05-05T11:52:34.690

0

This should work:

dfB.merge(dfA[['reason_name', 'Yes/No/Unfinished']], on='reason_name', how='left').sort_values(by=['date','time'])

Because date and time are strings you can do it like this:

UPDATE I found a better solution:

you can do it all in once without changing the date and time column:

(1) # I think this one is better

dfB.merge(dfA[['reason_name', 'Yes/No/Unfinished']], on='reason_name', how='left').sort_values(by=['date','time'], key=pd.to_datetime)

(2)

df = dfB.merge(dfA[['reason_name', 'Yes/No/Unfinished']], on='reason_name', how='left').reset_index(drop=True)

mask= pd.to_datetime(df['date'] + " " + df['time'], format="%Y-%m-%d %H:%M").sort_values().index

df.loc[mask]

Doing it with a mask doesn't change your dataframe at all. you just convert date and time as datetime, sort them and returning the index. Then you sort your df by that order.

edited May 05 '22 at 11:52

answered Apr 30 '22 at 09:24

Rabinzel

7,757
3
10
30

the sorting based on date and then time does not work. However, I think it has merged correctly based on mapping as far as I can see. I have 172 million + rows – Shihab Ullah May 04 '22 at 07:27
Are these columns formatted as datetime? I think if these are only strings the sorting won't work – Rabinzel May 04 '22 at 09:03
these are only strings. How can I make them date time => sort => reconvert to string? – Shihab Ullah May 05 '22 at 10:38
is it important that date and time is in two columns or can they also be joined together ? – Rabinzel May 05 '22 at 11:19
They can be joined together for sorting purpose and in the final df we can remove that "joined_date_time" column maybe? – Shihab Ullah May 05 '22 at 11:39
I thought of that too, found a different way (see my updated answer). Let me know if it's ok performance wise – Rabinzel May 05 '22 at 11:41
updated my answer again. you can specify a `key` in `sort_values` and it does the job. I tested it on a test df and it worked. – Rabinzel May 05 '22 at 11:53
I get the following error: TypeError: sort_values() got an unexpected keyword argument 'key' – Shihab Ullah May 05 '22 at 12:58
[see answer](https://stackoverflow.com/a/67689015/15521392) i guessyou don't have a current version of pandas – Rabinzel May 05 '22 at 13:02
I have pandas 1.0.5. Also the second solution you posted creates dataframe with missing values. – Shihab Ullah May 05 '22 at 13:13
I guess you have to provide your data then. All I can say is that it works on my side with everything I tried here – Rabinzel May 05 '22 at 13:26

Create a third column with column values of dataframe A by comparing two columns of dataframe A and B

1 Answers1