-1

I have a dataframe A which contains mapping of "reason_name" column to Yes, no, unfinished via "Yes/No/Unfinished" column. This screenshot has the current dataframe A and B picture contains a basic idea of it

I need to compare the "reason_name" column of dataframe A with that of dataframe B and then create a third column "donation_status" in dataframe B which will contain the corresponding row value of "Yes/No/Unfinished" column based on comparison

I want something like the screenshot below but make the rows sorted on ascending order of date and then ascending order of time:

enter image description here

I am trying the following code:

aggregated_call_logs_df["donation_status"] = aggregated_call_logs_df.reason_name.map(mapping_reason_name_donation_status_df)

However, I get the following error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

NOTE: I filled the nan values to empty string.

E_net4
  • 27,810
  • 13
  • 101
  • 139

1 Answers1

0

This should work:

dfB.merge(dfA[['reason_name', 'Yes/No/Unfinished']], on='reason_name', how='left').sort_values(by=['date','time'])

Because date and time are strings you can do it like this:

UPDATE I found a better solution:

you can do it all in once without changing the date and time column:

(1) # I think this one is better

dfB.merge(dfA[['reason_name', 'Yes/No/Unfinished']], on='reason_name', how='left').sort_values(by=['date','time'], key=pd.to_datetime)

(2)

df = dfB.merge(dfA[['reason_name', 'Yes/No/Unfinished']], on='reason_name', how='left').reset_index(drop=True)

mask= pd.to_datetime(df['date'] + " " + df['time'], format="%Y-%m-%d %H:%M").sort_values().index

df.loc[mask]

Doing it with a mask doesn't change your dataframe at all. you just convert date and time as datetime, sort them and returning the index. Then you sort your df by that order.

Rabinzel
  • 7,757
  • 3
  • 10
  • 30
  • the sorting based on date and then time does not work. However, I think it has merged correctly based on mapping as far as I can see. I have 172 million + rows – Shihab Ullah May 04 '22 at 07:27
  • Are these columns formatted as datetime? I think if these are only strings the sorting won't work – Rabinzel May 04 '22 at 09:03
  • these are only strings. How can I make them date time => sort => reconvert to string? – Shihab Ullah May 05 '22 at 10:38
  • is it important that date and time is in two columns or can they also be joined together ? – Rabinzel May 05 '22 at 11:19
  • They can be joined together for sorting purpose and in the final df we can remove that "joined_date_time" column maybe? – Shihab Ullah May 05 '22 at 11:39
  • I thought of that too, found a different way (see my updated answer). Let me know if it's ok performance wise – Rabinzel May 05 '22 at 11:41
  • updated my answer again. you can specify a `key` in `sort_values` and it does the job. I tested it on a test df and it worked. – Rabinzel May 05 '22 at 11:53
  • I get the following error: TypeError: sort_values() got an unexpected keyword argument 'key' – Shihab Ullah May 05 '22 at 12:58
  • [see answer](https://stackoverflow.com/a/67689015/15521392) i guessyou don't have a current version of pandas – Rabinzel May 05 '22 at 13:02
  • I have pandas 1.0.5. Also the second solution you posted creates dataframe with missing values. – Shihab Ullah May 05 '22 at 13:13
  • I guess you have to provide your data then. All I can say is that it works on my side with everything I tried here – Rabinzel May 05 '22 at 13:26