I have to build up a formula for a linear regression model (using glm() function), where I have too many variables to try. I am doing gene expression analysis. So, what I'm looking for is a way to concatenate all those variables in a single string (in this case, the variables would be the column names of my data.frame), so the formula can be achieved.
My data looks something like this (the actual data frame has 213 columns):
> df
Smoke PRR22 C15orf40 RAX2 GIMAP1 TM2D3 FAM167AAS1 LINC00161 SMCR8 CYP11B1
DP019 No 6.247058 4.609030 4.920439 3.531275 6.032196 1.576602 3.261709 5.752494 4.082924
DP021 Yes 5.767487 4.451362 4.834086 3.054192 6.049870 1.779412 2.618781 5.291328 4.274439
DP022 No 6.008855 4.841719 4.834774 3.354556 6.244215 1.580933 3.135989 4.989184 3.319836
DP025 Yes 5.390064 4.420183 4.923600 3.356938 5.516580 1.796413 2.984576 5.189582 3.833807
DP033 No 5.479384 5.987276 4.858381 3.454082 7.176767 1.640109 3.213976 5.378756 4.195856
DP035 No 5.439995 4.825332 5.469710 3.561561 6.357713 1.684058 3.635607 4.786237 3.792060
Where the first column ("Smoke") is my trait variable and the rest (gene names) are the gene expression level.
I would like to build something like this:
glm(Smoke ~ PRR22 + C15orf40 + RAX2 + GIMAP1... and so forth
My question is: how can I automate it in a way I have all my variables there?
Maybe concatenating the columns name in one string would solve the problem? For example:
for (i in colnames(df)[-1]){
form <- as.formula(paste0("Smoke ~ ", i))
glm(form, data=df)
}
But it is not working. I am sure I am missing something... or a lot. So, if anyone could help, that would be excellent!