0

I have a dataset with 20 variables and 250K rows. I would like to add new variable, "NumAdms", based on the number of rows in "OT_entry" per individual "patient_id". I have constructed a dummy example:

 library(dplyr)

  reproeg <- names(c("patient_id", "OT_entry", "Other_1", "Other_2",
                   "Other_3"))
  reproeg$patient_id <- c(123, 123, 453, 289, 123)
  reproeg$OT_entry <- c("01/01/2012 09:30:00", "20/01/2012 08:20:00", 
                      "02/01/2012 09:40:00", "10/01/2012 11:00:00",
                      "10/02/2012 09:40:00")
  reproeg$Other_1 <- c("xy", "xy", "xy", "zh", "xy")
  reproeg$Other_2 <- c(22.3, 33.1, 22.1, 33.5, 44.2)
  reproeg$Other_3 <- c(TRUE, FALSE, FALSE, TRUE, FALSE)

  reproeg %>%
    group_by(patient_id) %>%
    mutate(NumAdms, length(OT_entry))

I get the following error message:

Error in UseMethod("group_by_") : 
  no applicable method for 'group_by_' applied to an object of class "list"
  • So what is your question? Where exactly are you getting stuck? What exactly is the desired output? – MrFlick Nov 20 '18 at 14:26
  • Sorry! I've just added the error message that I get..."Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "list"" – Caroline K Nov 20 '18 at 14:31

2 Answers2

3

These days, you can also go for:

library(dplyr)

data.frame(reproeg) %>%
     add_count(patient_id) %>%
     rename(NumAdms = n)

add_count is equivalent to condensed group_by and mutate(colname = n()), however the column name for count will be by default n. The advantage here is that you don't need to ungroup() later on, for instance.

Output:

# A tibble: 5 x 6
  patient_id OT_entry            Other_1 Other_2 Other_3 NumAdms
       <dbl> <fct>               <fct>     <dbl> <lgl>     <int>
1        123 01/01/2012 09:30:00 xy         22.3 TRUE          3
2        123 20/01/2012 08:20:00 xy         33.1 FALSE         3
3        453 02/01/2012 09:40:00 xy         22.1 FALSE         1
4        289 10/01/2012 11:00:00 zh         33.5 TRUE          1
5        123 10/02/2012 09:40:00 xy         44.2 FALSE         3
arg0naut91
  • 14,574
  • 2
  • 17
  • 38
2

You never defined reproeg as a data.frame, so that's the first issue.
Second, mutate works with tag=value pairs.
Third, you're not counting the length of OT_entry, but the number of cases in each group, that's better done within mutate using n().

So your code should be:

data.frame(reproeg) %>%
     group_by(patient_id) %>%
     mutate(NumAdms=n())
# A tibble: 5 x 6
# Groups:   patient_id [3]
  patient_id OT_entry            Other_1 Other_2 Other_3 NumAdms
       <dbl> <fct>               <fct>     <dbl> <lgl>     <int>
1        123 01/01/2012 09:30:00 xy         22.3 TRUE          3
2        123 20/01/2012 08:20:00 xy         33.1 FALSE         3
3        453 02/01/2012 09:40:00 xy         22.1 FALSE         1
4        289 10/01/2012 11:00:00 zh         33.5 TRUE          1
5        123 10/02/2012 09:40:00 xy         44.2 FALSE         3
Dave2e
  • 22,192
  • 18
  • 42
  • 50
iod
  • 7,412
  • 2
  • 17
  • 36