I have a dataframe that looks like this:
df1 = pd.DataFrame({
"Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
"Case":[1,1,2,2,2,3,4],
"Application":["MicrosoftWord", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftOutlook", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"],
"Activity_of_the_User":["SavingADocument", "SendingAnEmail", "SavingADocument", "SendingAnEmail", "SendingAnEmail", "SavingADocument", "SavingADocument"],
"Receiver_email_root":["None", "idatta91 adarandall larryjacob", "None", "idatta91 larryjacob"," vanessaHudgens prithakaur", "None", "None"],
"Receiever_email_domains":["None", "gmail yahoo", "None", "gmail", "gmail yahoo", "None", "None"],
"Receiver_email_count_Catg":["None", "Few", "None", "Double", "Double", "None", "None"],
"Subject":["None","Activity Report", "None", "Project Progress Report", "Project Progress Report 2", "None", "None"]
})
I want to merge the rows of the dataframe according to the column Case
. So, if the number in the Case
column of two or more rows is the same then the strings of the other columns of those rows are concatenated into one row.
Also the the values in the column Business_Process_Activity
is the same for the cases of the same number. For that column, I do not want to concatenate the Business_Process_Activity
values but keep only one of them since that column needs to be categorical. I want the final dataframe to look like this:
df2 = pd.DataFrame({"Case":[1,2,3,4],
"Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
"Application":["MicrosoftWord MicrosoftOutlook", "MicrosoftWord MicrosoftOutlook MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"],
"Activity_of_the_User":["SavingADocument SendingAnEmail","SavingADocument SendingAnEmail SendingAnEmail", "SavingADocument", "SavingADocument"],
"Receiver_email_root":["idatta91 adarandall larryjacob", "idatta91 larryjacob vanessaHudgens prithakaur", "None", "None"],
"Receiever_email_domains":["gmail yahoo","gmail gmail yahoo", "None", "None"],
"Receiver_email_count_Catg":["Few", "Double Double", "None", "None"],
"Subject":["Activity Report", "Project Progress Report Project Progress Report 2", "None", "None"]
})
If strings are merged with a "None" column, the "None" string should be deleted since that value is not empty anymore. The duplicate number of the case columns should be deleted as the rows are merged into one.
How do I do this? Thanks in advance!