I have a dataframe such as below, that is already sorted:
id desc
1 GAG
1 GAG
1 AP
1 RT
1 RT
2 ABC
2 ABC
2 DEF
2 GHI
2 ABC
From this, I want to generate a second dataframe containing the ID in one column followed by a second column showing the order that "DESC" items appear for each ID. I also do not want duplicates IF they appear in order.
Desired output:
id ORDER
1 GAG-AP-RT
2 ABC-DEF-GHI-ABC
If the same DESC appears twice for the same ID, but not next to each other, I would like it to be included.
I have tried iterating through each row in R, generating a temporary dataframe containing only entries linked to a single ID, however I have been unable to include each DESC once in the order they appear.
For my solution using R, I have "premade" an output table called "record", which contains one row for each ID. This has allowed me to get part way to a solution:
for (i in 1:length(record$id)) {
+ tmp <- input[input$id == record$id[i],]
+ record$ORDER[i] <- paste(tmp$desc, collapse = "-")
+ }
This resulted in:
id ORDER
1 GAG-GAG-AP-RT-RT
2 ABC-ABC-DEF-GHI-ABC
Any help would be much appreciated!
UPDATE
I think it is nearly there, I now have:
for (i in 1:length(record$id)) {
+ tmp <- input[input$id == record$id[i],]
+ record$ORDER[i] <- paste(unique(tmp$desc), collapse = "-")
+ }
However this has removed occurrences of the same string in desc that are separated by a different string. I need to remove occurrence of the same string next to each other, but not when separated by a different one.