0

My current table is like this :

ID       Patient_id       Date         Drug         `n_distinct(Drug)`
                                   
1        A123             2014-01-01   5FU           1
2        A123             2014-01-02   fluorouracil  1
3        A123             2014-01-02   oxaliplatin   1
4        A125             2014-01-02   paracetamol   1
5        A126             2014-01-02   pantocid      1
6        B456             2014-01-02   thyronorm     1
7        B456             2015-06-01   fluorouracil  1
8        B456             2015-06-01   oxaliplatin   1

Basically i want to have three columns , Patient id , Date, All the Drugs that patient has consumed on that day in one row.

However, i want the table output to be like this :

ID       Patient_id       Date         Drugs
                     
1        A123             2014-01-01   5FU
2        A123             2014-01-02   fluorouracil,oxaliplatin
3        A125             2014-01-02   paracetamol
5        A126             2014-01-02   pantocid
6        B456             2014-01-02   thyronorm
7        B456             2015-06-01   fluorouracil, oxaliplatin

Please suggest.

Player1
  • 2,878
  • 2
  • 26
  • 38
Love Tyagi
  • 37
  • 5

2 Answers2

0

We can do this with dplyr

require(dplyr)

df %>%
  group_by(Patient_id, Date) %>%
  summarise(Drug = paste0(Drug, collapse = ","))

  Patient_id       Date                     Drug
      <fctr>     <fctr>                    <chr>
1       A123 2014-01-01                      5FU
2       A123 2014-01-02 fluorouracil,oxaliplatin
3       A125 2014-01-02              paracetamol
4       A126 2014-01-02                 pantocid
5       B456 2014-01-02                thyronorm
6       B456 2015-06-01 fluorouracil,oxaliplatin

Data

df = structure(list(ID = 1:8, Patient_id = structure(c(1L, 1L, 1L, 
2L, 3L, 4L, 4L, 4L), .Label = c("A123", "A125", "A126", "B456"
), class = "factor"), Date = structure(c(1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L), .Label = c("2014-01-01", "2014-01-02", "2015-06-01"
), class = "factor"), Drug = structure(c(1L, 2L, 3L, 5L, 4L, 
6L, 2L, 3L), .Label = c("5FU", "fluorouracil", "oxaliplatin", 
"pantocid", "paracetamol", "thyronorm"), class = "factor"), X.n_distinct.Drug.. = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("ID", "Patient_id", 
"Date", "Drug", "X.n_distinct.Drug.."), class = "data.frame", row.names = c(NA, 
-8L))
Vlo
  • 3,168
  • 13
  • 27
0

Here is an option using aggregate from base R

aggregate(Drug~., df[2:4], toString)
#   Patient_id       Date                      Drug
#1       A123 2014-01-01                       5FU
#2       A123 2014-01-02 fluorouracil, oxaliplatin
#3       A125 2014-01-02               paracetamol
#4       A126 2014-01-02                  pantocid
#5       B456 2014-01-02                 thyronorm
#6       B456 2015-06-01 fluorouracil, oxaliplatin
akrun
  • 874,273
  • 37
  • 540
  • 662