I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.
Problem
The tables as a reproducible example
The following code generates tables similar to my set, first as tall data, second as wide data.
library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4
tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
3999.066, 3999.066, 3999.066,
3998.745, 3998.745, 3998.745,
3998.423, 3998.423, 3998.423,
3998.102, 3998.102, 3998.102),
Y=rnorm(15, mean=2, sd=1),
S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1
#> 2 3999. 1.81 s2
#> 3 3999. 4.02 s3
#> 4 3999. 1.21 s1
#> 5 3999. 0.771 s2
#> 6 3999. 2.39 s3
wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02
Created on 2018-11-08 by the reprex package (v0.2.1)
In the tall version, each unique value of X
gets repeated for however many unique values of S
there are. There are 5 unique X
and 3 unique S
. This is much more apparent in the wide data. In my real set I have 8010 unique X
and 312 unique S
. The tall data is nice because I can easily plot X
vs Y
and get one plotted line for each S
.
The Question
What if I want to average all of the Y
s at each unique value of X
? It would look like this:
> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33
In this case I used the wide table, calculated the mean of each X
column, and then manually constructed a new table.
Can I do this with map()
functions from purrr
? The documentation was confusing, probably because I have never used lapply()
functions before.
Thanks for reading. I have a feeling this is really simple for most experienced users.