I received data in a csv. from a report pulled from Salesforce. Each row represents a transaction and the person that worked on the transaction. There are multiple cases where more than 1 person worked on the same transaction so in those cases there are duplicate rows that show the date, time stamp, location, office, etc. except for the person's name.
I want each row to represent one transaction and in the case where multiple people worked on the same transaction, a new column would be created to capture the other person's name.
This is what the data looks like:
Trans_ID | Name
================
1 | Dom Toretto
2 | Brian Oconnor
3 | Letty Garcia
3 | Mia Toretto
4 | Brian Oconnor
4 | Dom Toretto
4 | Letty Garcia
4 | Jesse
This is what I want:
Trans_ID | Name | Name2 | Name3 | Name4
===============================================================
1 | Dom Toretto | Letty Garcia | |
2 | Brian Oconnor | | |
3 | Letty Garcia | Mia Toretto | |
4 | Brian Oconnor | Dom Toretto | Letty Garcia | Jesse
I want to do this in R, but if it is easier in SQL or Excel, that would work too.
Thank you!
EDIT So I tried using case when and lag in r to do this but now I have a new problem of deleting the rows that are no longer necessary. This may also be the wrong approach. Any ideas?
df<- data.frame(trans_id = c(1,2,3,3,4,4,4,4),
Name = c("Dom Toretto","Brian Oconnor","Letty Garcia","Mia Toretto","Brian Oconnor","Dom Toretto","Letty Garcia","Jesse"))
df %>%
mutate(
Name2 = case_when(
trans_id == lag(trans_id) ~ lag(Name)
),
Name3 = case_when(
trans_id == lag(trans_id, 2) ~ lag(Name, 2)
),
Name4 = case_when(
trans_id == lag(trans_id, 3) ~ lag(Name, 3)
))