Loop linear regression and saving ALL coefficients

Question

Based on the link below, I created a code to run regression on subsets of my data based on a variable.

Loop linear regression and saving coefficients

In this example I created a DUMMY (0 or 1) to create the subsets (in reality I have 3000 subsets)

res <- do.call(rbind, lapply(split(mydata, mydata$DUMMY),function(x){
  fit <- lm(y~x1 + x2, data=x)
  res <- data.frame(DUMMY=unique(x$DUMMY), coeff=coef(fit))
  res
}))

This results in the following dataset

                DUMMY   coeff

0.(Intercept)   0    22.8419956
0.x1            0   -11.5623064
0.x2            0     2.1006948
1.(Intercept)   1     4.2020874
1.x1            1    -0.4924303
1.x2            1     1.0917668

What I would like however is one row per regression, and the variables in the columns. I also need the p values and standard errors included.

DUMMY   interceptx1   coeffx1   p-valuex1   SEx1   coeffx2  p-valuex2   SEx2
0          22.84       -11.56      0.04     0.15    2.10     0.80       0.90
1          4.20        -0.49       0.10     0.60    1.09     0.60       1.20

Any idea how to do this?

Just as a side note, it's not always a good idea to base the effectiveness of a linear model by just using the p values — aeongrail, Feb 10 '16 at 18:45
To get the SE's and p-values you need to call `summary.lm`. Start out by modifying that code by Heroka so it builds a vector from the `$coefficients` component of one model and then `rbind` those values. — IRTFM, Feb 10 '16 at 19:00
`library(nlme); fits <- lmList(y~x1 + x2 | DUMMY, data=mydata); summary(fits)` — Roland, Feb 10 '16 at 19:32

score 2 · Answer 1 · answered Feb 11 '16 at 10:50

While your desired output is (IMHO) not really tidy data, here is an approach using data.table and a custom-built extraction-function. It has an option to return a wide or long form of the results.

The extractor-function takes in a lm-object, and returns estimates, p-values and standard errors for all variables.

extractor <- function(model, return_wide = F){
  #get datatable with coefficient, se and p-value
  model_summary <- as.data.table(summary(model)$coefficients[,-3])
  model_summary[,variable:=names(coef(model))]
  #do some reshaping
  step2 <- melt(model_summary, id.var="variable",variable.name="measure")
  if(!return_wide){
    return(step2)
  }
  step3 <- dcast(step2, 1~variable+measure,value.var="value")
  return(step3)
}

Demonstration:

res_wide <- dat[,extractor(lm(y~x1 + x2), return_wide = T), by = dummy]
> res_wide
# dummy . (Intercept)_Estimate (Intercept)_Std. Error (Intercept)_Pr(>|t|)  x1_Estimate x1_Std. Error x1_Pr(>|t|) x2_Estimate x2_Std. Error x2_Pr(>|t|)
# 1:     0 .           0.04314707             0.04495702            0.3376461 -0.054364406    0.04441204   0.2214895  0.01333804    0.04620999   0.7729757
# 2:     1 .          -0.04137086             0.04471550            0.3553164  0.009864255    0.04533808   0.8278539  0.05272257    0.04507189   0.2426726


res_long <-  dat[,extractor(lm(y~x1 + x2)), by = dummy]
# dummy    variable    measure        value
# 1:     0 (Intercept)   Estimate  0.043147072
# 2:     0          x1   Estimate -0.054364406
# 3:     0          x2   Estimate  0.013338043
# 4:     0 (Intercept) Std. Error  0.044957023
# 5:     0          x1 Std. Error  0.044412037
# 6:     0          x2 Std. Error  0.046209987
# 7:     0 (Intercept)   Pr(>|t|)  0.337646052
# 8:     0          x1   Pr(>|t|)  0.221489530

Data used:

library(data.table)
set.seed(123)
nobs = 1000
dat <- data.table(
  dummy = sample(0:1,nobs,T),
  x1 = rnorm(nobs),
  x2 = rnorm(nobs),
  y = rnorm(nobs))

thank you, that works perfectly. The only problem when I replace the code by my variables and model, I get the following error: Error in `[.data.frame`(mydata, , extractor(lm(y ~ x + : unused argument (by = dummy) — research111, Feb 11 '16 at 15:13
You seem to have two comma's there, and my example was `data.table`-syntax. — Heroka, Feb 11 '16 at 15:20
I have only 1 comma in the code (2 in error) and load data.table (tried both 1.9.4 and 1.9.6) > res_wide <- masterfilesales[,extractor(lm(salesunits~price + pricepromo + display + feature), return_wide = T), by = PL] Error in `[.data.frame`(masterfilesales, , extractor(lm(salesunits ~ price + : unused argument (by = PL) — research111, Feb 11 '16 at 15:48
Did you use setDT? This an error that you get when you apply `data.table -syntax on a `data.frame`. You could do `setDT(masterfilesales)` before this code. — Heroka, Feb 11 '16 at 15:54

Loop linear regression and saving ALL coefficients

1 Answers1

Linked