0

I wanted to model my snps array. I can do this one by one using the following code.

Data$DX=as.factor(Data$DX)
univariate=glm(relevel(DX, "CON") ~ relevel(rs6693065_D,"AA"), family = binomial, data = Data)
summary(univariate)
exp(cbind(OR = coef(univariate), confint(univariate)))

How can I do this for all other snps using a loop or apply? The snps are rs6693065_D, rs6693065_A and hundreds of them. From the above code only "rs6693065_D" will be replaced by all other snps. Best Regards Zillur

zillur rahman
  • 355
  • 1
  • 13
  • Are *snps* columns in *Data*? Specifically all columns but *DX*? Better yet `dput(head(Data))` for [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Parfait May 07 '19 at 18:22
  • Thank you. Yes. snps are columns in Data. DX is also a column with two levels ('CAS' and 'CON'). – zillur rahman May 07 '19 at 18:44
  • Are there other columns in *Data* except *snps* and *DX*? – Parfait May 07 '19 at 19:03
  • Thank you. Yes. There are many other columns. There are a total of 489 columns, among them 3:426 are SNPs. – zillur rahman May 07 '19 at 19:15

1 Answers1

1

Consider developing a generalized method to handle any snps. Then call it iteratively passing every snps column using lapply or sapply:

# GENERALIZED METHOD
proc_glm <- function(snps) {
   univariate <- glm(relevel(data$DX, "CON") ~ relevel(snps, "AA"), family = binomial)

   return(exp(cbind(OR = coef(univariate), confint(univariate))))
}

# BUILD LIST OF FUNCTION OUTPUT 
glm_list <- lapply(Data[3:426], proc_glm)

Use tryCatch in case of errors like relevel:

# BUILD LIST OF FUNCTION OUTPUT 
glm_list <- lapply(Data[3:426], function(col) 
                   tryCatch(proc_glm(col), error = function(e) e))

For building a data frame, adjust method and lapply call followed with a do.call + rbind:

proc_glm <- function(col){
  # BUILD FORMULA BY STRING
  univariate <- glm(as.formula(paste("y ~", col)), family = binomial, data = Data)

  # RETURN DATA FRAME OF COLUMN AND ESTIMATES
  cbind.data.frame(COL = col,
                   exp(cbind(OR = coef(univariate), confint(univariate)))
  )
}

# BUILD LIST OF DFs, PASSING COLUMN NAMES
glm_list <- lapply(names(Data)[3:426], 
                   tryCatch(proc_glm(col), error = function(e) NA))

# APPEND ALL DFs FOR SINGLE MASTER DF
final_df <- do.call(rbind, glm_list)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you. In the output call, I am getting this error. " Error in relevel.factor(snps, "AA") : 'ref' must be an existing level ". I have other levels in snps including 'AA' – zillur rahman May 07 '19 at 19:29
  • There might be one of your 400+ that do not have `AA` level. See `tryCatch` edit where you can catch the error in `lapply` but returns as `NA` in sapply. – Parfait May 07 '19 at 19:36
  • Thank you very much. It works. Another question in downstream. The output I am getting is looks like: – zillur rahman May 07 '19 at 19:44
  • Thank you very much. It works. Another question in downstream. The output I am getting looks like: "$rs3024847_R OR 2.5 % 97.5 % (Intercept) 1.2285714 0.8970813 1.688313 relevel(snps, "AA")OTHER 0.7657037 0.5404940 1.081639" . Is it possible to convert the output into a data frame with the above columns (OR, 2.5% and 97.5%) and rownames as the names of the snps. – zillur rahman May 07 '19 at 20:02
  • See extension for building a list of data frames for `rbind` outside loop. – Parfait May 07 '19 at 20:28