0

I am very new to R software and appreciate if you can provide some suggestions to combine variables (antibiotic) within common variable (antibiotic_date).

My original data looks like this (3X3 table);

  • id: 1
  • antibiotic: a, b, c
  • antibiotic_date: 2018-01-20, 2018-01-20, 2018-03-04

Is it possible to transform the above date to (3X table);

  • id: 1
  • antibiotic: a b, c
  • antibiotic_date: 2018-01-20, 2018-03-04

Thank you very much for your help.

LuLuGaGa
  • 13,089
  • 6
  • 49
  • 57
zinc54574
  • 1
  • 1

1 Answers1

0

Looks like you have

df
#   id antibiotic antibiotic_date
# 1  1          a      2018-01-20
# 2  1          b      2018-01-20
# 3  1          c      2018-03-04

Use unique in aggregate.

(res1 <- aggregate(. ~ antibiotic_date, df, unique))
#   antibiotic_date id antibiotic
# 1      2018-01-20  1       a, b
# 2      2018-03-04  1          c

Where

str(res1)
# 'data.frame': 2 obs. of  3 variables:
# $ antibiotic_date: chr  "2018-01-20" "2018-03-04"
# $ id             : chr  "1" "1"
# $ antibiotic     :List of 2
# ..$ : chr  "a" "b"
# ..$ : chr "c"

If you need a string rather than a vector of length > 1 make it toString,

(res2 <- aggregate(. ~ antibiotic_date, df, \(x) toString(unique(x))))
#   antibiotic_date id antibiotic
# 1      2018-01-20  1       a, b
# 2      2018-03-04  1          c

where:

str(res2)
# 'data.frame': 2 obs. of  3 variables:
# $ antibiotic_date: chr  "2018-01-20" "2018-03-04"
# $ id             : chr  "1" "1"
# $ antibiotic     : chr  "a, b" "c"

Or paste,

(res3 <- aggregate(. ~ antibiotic_date, df, \(x) paste(unique(x), collapse=' ')))
#   antibiotic_date id antibiotic
# 1      2018-01-20  1        a b
# 2      2018-03-04  1          c

where:

str(res3)
# 'data.frame': 2 obs. of  3 variables:
# $ antibiotic_date: chr  "2018-01-20" "2018-03-04"
# $ id             : chr  "1" "1"
# $ antibiotic     : chr  "a b" "c"

You can also wrap a sort around it, if needed, e.g. sort(toString(unique(.))).


Data:

df <- structure(list(id = c(1, 1, 1), antibiotic = c("a", "b", "c"), 
    antibiotic_date = c("2018-01-20", "2018-01-20", "2018-03-04"
    )), class = "data.frame", row.names = c(NA, -3L))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thanks very much. it worked. However, combined antibiotic (a,b) on 2018-01-20 was shown as c("a", "b') on my dataframe, and when i tried str_match(antibiotic, "a"), it showed error message, "argument is not an atomic vector; coercing". I think this error message was due to having variable as c("a", "b") rather than a,b. Do you have any suggestion for this? Thanks – zinc54574 Oct 05 '22 at 00:02
  • I am having some other problems. Firstly, there are different pt_id with same antibiotic_dates. When I used "aggregate(. ~ antibiotic_date, df, unique))", it combined both pt_id and antibiotic_date. Is there a way that I can combine antibiotics that has same antibiotic_date within same pt_id? Secondly, there is an error message when i ran "aggregate(. ~ antibiotic_date, df, \(x) toString(unique(x))))", "unexpected input in "aggregate (.~antibiotic_date, df, \". What am i doing wrong? Thanks again. – zinc54574 Oct 12 '22 at 22:56
  • @zinc54574 To your second issue: Update your R or use `function(x)` instead of the new shorthand `\(x)`. To your first issue: I just guessed your data, you definitely have to make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) now. Thanks. – jay.sf Oct 13 '22 at 00:24