0

I'm trying to (somewhat) elegantly fit 3 models (linear, exponential and quadratic) to a dataset with classes/factors and save p-values and R2 for each model and class/factor. Simple dataset with 3 variables: x,y, and class. What I can't figure out is how to force each of the 3 models to fit to each of the 3 classes. What I have now fits each model to the complete dataset. The next question is how I then output p-values & R2 to a table, for each model+class

My code looks like:

set.seed(100)
library(plyr)  
#create datast
nit <- within(data.frame(x = 3:32),
                 {
                   class <- rep(1:3, each = 10)
                   y <- 0.5 * x* (1:10) + rnorm(30)
                   class <- factor(class)                 # convert to a factor
                 }
                 )
x2<-nit$x*nit$x #for quadratic model
forms<- paste(c("y ~ x", "y ~ x+x2", "log(y) ~ x"), sep = "") # create 3 models
names(forms) <- paste("Model", LETTERS[1:length(forms)])
models <- llply(forms, lm, data = nit)
models                # shows coefficients for each of the 3 models

3 Answers3

1

There are a variety of ways to do this, but I liked how the names came out of nested lapply calls better than my mapply or do (from package dplyr) solutions even though the code looks a bit complicated. The names made it easier to tell the models apart (which forms and class combination each list element represented).

In this solution, it is important to actually add x2 to the nit dataset.

nit$x2 = nit$x*nit$x
models = lapply(forms, 
            function(x) {
                lapply(levels(nit$class), 
                    function(y) {lm(x, data = nit[nit$class == y,])} )
})

The output is a lists of lists, though, so I had to flatten this into a single list using unlist with recursive = FALSE.

models2 = unlist(models, recursive = FALSE)

Now you can easily pull out elements you want from the summary of each model. For example, here is how you might pull at the R-squared for each model:

lapply(models2, function(x) summary(x)$r.squared)

Or if you want a vector instead of a list:

unlist(lapply(models2, function(x) summary(x)$r.squared))
aosmith
  • 34,856
  • 9
  • 84
  • 118
0

You could consider to jump into linear and quadratic discrimination analyis, LDA and QDA. This guide provides an easy introduction

http://tgmstat.wordpress.com/2014/01/15/computing-and-visualizing-lda-in-r/

Soren Havelund Welling
  • 1,823
  • 1
  • 16
  • 23
0

Maybe like this? You can probably adapt it to do exactly what you want.

modsumm <- llply(models, summary)
ldply(modsumm, function(x) data.frame(term = row.names(x$coefficients),
                                      x$coefficients,
                                      R.sq = x$r.squared))

      .id        term     Estimate  Std..Error    t.value     Pr...t..      R.sq
1 Model A (Intercept) -12.60545292 11.37539598 -1.1081331 2.772327e-01 0.5912020
2 Model A           x   3.70767525  0.58265177  6.3634498 6.921738e-07 0.5912020
3 Model B (Intercept)  16.74908684 20.10241672  0.8331877 4.120490e-01 0.6325661
4 Model B           x  -0.73357356  2.60879262 -0.2811927 7.807063e-01 0.6325661
5 Model B          x2   0.12689282  0.07278352  1.7434279 9.263740e-02 0.6325661
6 Model C (Intercept)   1.79394266  0.32323588  5.5499490 6.184167e-06 0.5541830
7 Model C           x   0.09767635  0.01655626  5.8996644 2.398030e-06 0.5541830

Or, if you just want the p-value from the F statistic and the R squared

 ldply(modsumm, function(x) data.frame(F.p.val = pf(x$fstatistic[1],
                                                    x$fstatistic[2],
                                                    x$fstatistic[3],
                                                    lower.tail = F),
                                       R.sq = x$r.squared))

      .id      F.p.val      R.sq
1 Model A 6.921738e-07 0.5912020
2 Model B 1.348711e-06 0.6325661
3 Model C 2.398030e-06 0.5541830
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294