0

I have 40 SNPs and want to see the effect each individual SNP has on the age of menopause. To do this I need do a multiple linear regression each of the individual SNPs. I want to avoid typing the same command 40 different times (as in the future I'll be doing this with even more SNPs).

What I want to do is make a list of the SNPs in a csv file and call this x:

x <- read.csv("snps.csv")

Then I want to use this list in this command;

fit <- lm(a_menopause ~ "snps" + country, data=mydata)

Where snps is my list of the SNPs I need to analyse, but needs to be done one SNP at a time. I'd ideally like to print the results to a csv file.

jkdev
  • 11,360
  • 15
  • 54
  • 77
  • Why do you need separate models? Combine everything in one model. – Roland May 28 '15 at 13:48
  • I need to see the effect of each individual SNP on the age of menopause – Claire 'Scotty' Bear May 28 '15 at 14:22
  • loop through the list and fit a model to each `x` if want different intercepts – Rorschach May 28 '15 at 14:36
  • Hi 6pool thanks for your answer. I'm not entirely sure how I can do that. What I've got so far is a csv document with my SNPs (and the data for those variables). I've named that as snps. I then did > varlist <- names(snps) [then I know I should add numbers here but I'm not sure what these numbers are] then I should do this models <- lapply(varlist, function(x) # but I also don't understand what function(x) is exactly – Claire 'Scotty' Bear May 28 '15 at 14:54
  • 1
    Please contact a statistician. – Roland May 28 '15 at 17:39
  • @Claire'Scotty'Bear see [GenABEL package](http://www.genabel.org/sites/default/files/pdfs/GenABEL-tutorial.pdf). – zx8754 Jun 05 '15 at 17:34

1 Answers1

1

See below example:

#dummy data
set.seed(123)
#phenotype
phenotype <- data.frame(
  a_menopause=sample(c(0,1),10,replace=TRUE),
  country=sample(letters[1:3],10,replace=TRUE))
#genotype
genotype <- 
read.table(text="SNP1   SNP2    SNP3    SNP4
1   0   1   1
           2    0   2   1
           0    0   0   1
           0    0   0   1
           0    1   0   1
           1    1   0   1
           1    2   0   1
           1    2   1   2
           0    0   0   1
           0    1   0   1
           ",header=TRUE)
#data for lm
fitDat <- cbind(phenotype,genotype)

#get fit for all SNPs
fitAllSNPs <-
  lapply(colnames(fitDat)[3:6], function(SNP){
    fit <- lm(paste("a_menopause ~ country + ", SNP), 
              data=fitDat)
    })

#extract coef for each SNP
lapply(fitAllSNPs,coef)

#output
# [[1]]
# (Intercept)      countryb      countryc          SNP1 
# 1.000000e+00 -2.633125e-16 -1.000000e+00  9.462903e-17 
# 
# [[2]]
# (Intercept)      countryb      countryc          SNP2 
# 1.000000e+00 -1.755417e-16 -1.000000e+00  2.780094e-19 
# 
# [[3]]
# (Intercept)      countryb      countryc          SNP3 
# 1.000000e+00 -2.633125e-16 -1.000000e+00  1.079985e-16 
# 
# [[4]]
# (Intercept)      countryb      countryc          SNP4 
# 1.000000e+00 -1.755417e-16 -1.000000e+00  4.269835e-31 
zx8754
  • 52,746
  • 12
  • 114
  • 209