I'm having a problem I can't figure out... Basically I want to generate mean, SD, and N per group for a number of variables. My data looks like this:
dataSet <- data.frame(study_id=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
Timepoint=c(1,6,12,18,1,6,12,18,1,6,12,18,1,6,12,18),
Secretor=c(0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1),
Gene1=runif(16, min=0, max=100),
Gene2=runif(16, min=0, max=100),
Gene3=runif(16, min=0, max=100),
Gene4=runif(16, min=0, max=100))
Then I group it...
library(tidyverse)
grouped_dataSet <- dataSet %>%
group_by(Secretor, Timepoint)
When I run the following line of code, I get what I want:
summarise(grouped_dataSet, mean = mean(Gene1, na.rm=T), sd = sd(Gene1, na.rm=T), n = n())
Output:
# A tibble: 8 x 5
# Groups: Secretor [2]
Secretor Timepoint mean sd n
<dbl> <dbl> <dbl> <dbl> <int>
1 0 1 21.8 18.6 2
2 0 6 34.8 33.2 2
3 0 12 43.1 4.34 2
4 0 18 72.6 38.0 2
5 1 1 13.3 15.3 2
6 1 6 41.2 22.8 2
7 1 12 44.9 25.7 2
8 1 18 37.0 8.49 2
However, when I write this same line of code as a function (which I'm intending to then map onto many columns using tidyverse's purrr package), it doesn't work, instead returning "NA" for everything except the n column:
summary_function <- function(x) {
summary <- summarise(grouped_dataSet, mean = mean(x, na.rm=T), sd = sd(x, na.rm=T), n = n())
return(summary)
}
summary_function("Gene1")
Output:
# A tibble: 8 x 5
# Groups: Secretor [2]
Secretor Timepoint mean sd n
<dbl> <dbl> <dbl> <dbl> <int>
1 0 1 NA NA 2
2 0 6 NA NA 2
3 0 12 NA NA 2
4 0 18 NA NA 2
5 1 1 NA NA 2
6 1 6 NA NA 2
7 1 12 NA NA 2
8 1 18 NA NA 2
This is the warning I get:
In var(if (is.vector(x) || is.factor(x)) x else as.double(x), ... :
NAs introduced by coercion
Could anyone please provide advice as to why it works as a line of code, but not as a function?
Many thanks in advance.