The dplyr
package in R does not do well in accepting formulas in the form of y~x
into its functions based on my research. So the other alternative is to calculate it someone manually. Now let me first inform you that slope = cor(x,y)*sd(y)/sd(x)
(reference found here: http://faculty.cas.usf.edu/mbrannick/regression/regbas.html) and that the intercept = mean(y) - slope*mean(x)
. Simple linear regression requires that we use the centroid as our point of reference when finding our intercept because it is an unbiased estimator. Using a single point will only get you the intercept of that individual point and not the overall intercept.
Now for this explanation, I will be using the mtcars
data set. I only wanted a subset of the data so I am using variables c('mpg', 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec')
to basically mimic your dataset. In my example, my grouping variable is 'cyl'
, which is the equivalent of your 'timepoint' variable. The variable 'mpg'
is the y-variable in this case, which is equivalent to 'Abs'
in your data.
Based on my explanation of slope and intercept above, it is clear that we need three tables/datasets: a correlation dataset for your y with respect to your x for each group, a standard deviation table for each variable and group, and a table of means for each group and each variable.
To get the correlation dataset, we want to group by 'cyl'
and calculate the correlation coefficients for , you should use:
df <- mtcars[c('mpg', 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec')]
corrs <- data.frame(k1 %>% group_by(cyl) %>% do(head(data.frame(cor(.[,c(1,3:7)])), n = 1)))
Because of the way my dataset is structured, the second variable (df[ ,2])
is 'cyl'
. For you, you should use
do(head(data.frame(cor(.[,c(2:40)])), n = 1)))
since your first column is the grouping variable and it is not numeric. Essentially, you want to go across all numeric variables. Not using head
will produce a correlation matrix, but since you are interested in finding the slope independent of each other x-variable, you only need the row that has the correlation coefficient of your y-variable equal to 1 (r_yy = 1
).
To get standard deviation and means for each group, each variable, use
sds <- data.frame(k1 %>% group_by(cyl) %>% summarise_each(funs(sd)))
means <- data.frame(k1 %>% group_by(cyl) %>% summarise_each(funs(mean)))
Your group names will be the first column, so make sure to rename your rows for each dataset corrs
, sds
, and means
and delete column 1.
rownames(corrs) <- rownames(means) <- rownames(sds) <- corrs[ ,1]
corrs <- corrs[ ,-1]; sds <- sds[ ,-1]; means <- means[ ,-1]
Now we need to calculate the sd(y)/sd(x)
. The best way I have done this, and seen it done is using an apply
affiliated function.
sdst <- data.frame(t(apply(sds, 1, function(X) X[1]/X)))
I use X[1]
because the first variable in sds
is my y-variable. The first variable after you have deleted timepoint
is Abs
which is your y-variable. So use that.
Now the rest is pretty straight forward. Since everything is saved as a data frame, to find slope, all it you need to do is
slopes <- sdst*corrs
inter <- slopes*means
intercept <- data.frame(t(apply(inter, 1, function(x) x[1]-x)))
Again here, since our y-variable is in the first column, we use x[1]
. To check if all is well, your slopes for your y-variable should be 1 and the intercept should be 0.