1

I have a master data frame that contains ~200 unique 'IDs' and each of these IDs have about ~200 each 'orders'. I have split the master data frame into 200 individual data frames using

list2env(split(df, df$id), envir = .GlobalEnv)

Now since I have 200 individual data frames per each unique ID, I want to create a GLM where I can get the coefficients and R^2 value for each of these IDs printed in another master data frame.

So instead of doing (where '1' through '200' are all the IDs) :

test1 <- glm(1$response_var ~ variableA + variableB + variableC, family=gaussian(), data=1)

and manually printing the coefficients while repeating this for the 200 ID's, is there a function or certain loop I could use to get all the coefficients and R^2 value printed in a single data frame?

So for this example the end result would be 200 rows for each ID, and 6 columns for the ID, Intercept, Coefficient1, Coefficient2, Coefficient3, and R^2

zx8754
  • 52,746
  • 12
  • 114
  • 209
dgssd
  • 53
  • 6
  • 1
    Why did you split your data into separate data.frame variables? It would have been much easier to `lapply()` over the list of data.frames or just use `by()` in the first place. If you want working code, it's best to include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – MrFlick Jun 01 '15 at 20:48

1 Answers1

0

Try this example:

#dummy data
set.seed(123)
df <- data.frame(
  id=rep(c(1,2,3),10),
  response_var=rep(c(1,2),15),
  variableA=runif(30),
  variableB=runif(30),
  variableC=runif(30))

#split by id
df_list <- split(df,df$id)

#loop through every id
do.call(rbind,
        lapply(df_list, function(x){
          fit <- glm(response_var ~ variableA + variableB + variableC, family=gaussian(), data=x)
          coef(fit)
        }))

#output
#   (Intercept)  variableA   variableB   variableC
# 1    0.630746  1.4443321 -0.40875486  0.42797033
# 2    1.447003  0.7121737 -0.01226043 -0.93282962
# 3    1.450429 -0.2306031  0.47827197 -0.01190812

Note: R2 for glm is a whole new world, see Pseudo R squared formula for GLMs and Is R2 useful or dangerous?

Community
  • 1
  • 1
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • 1
    yes that worked perfectly and do understand that R2 isn't in the summary for GLM as well for these reasons. If I wanted to put R2 in there would have to just change the coef(fit) line to: cor(df_list$response,predict(fit))^2 ? – dgssd Jun 01 '15 at 21:36
  • And also, lets say VariableB for example is a factor variable with 5 levels (in the master data frame). But for some certain ID's, there is only 1 level. How would I not break the 'contrasts can be applied only to factors with 2 or more levels' when using this? – dgssd Jun 01 '15 at 21:57
  • @dgssd Avoid asking extra questions in the comments, you need to define the problem clearly, provide [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and clarify expected output. – zx8754 Jun 01 '15 at 22:00