0

I am working on a very large dataset and have laid out a simple version below

group <- c(rep("A", 3), rep("B", 3), rep("C", 3))
X <- c(0, 1, 2, 0, 1, 2, 0, 1, 2)
Y <- c(0, 2, 4, 0, 3, 6, 0, 4, 8)   
df <- data.frame(group, X, Y)

I am attempting to obtain, through linear regression, the coefficients of three lines corresponding to groups A, B, and C (factor variables)... with little luck from the below code...

I came across some R code where a ' * ' sign was suggested to be used on the independent variable to (in the case of this example) calculate the slope of line A, B, and C. A, B, and C being a factor variable.

lin.reg <- lm(Y ~ X*group, data = df)
coefficients_for_ABC <- summary(lin.reg)

I think this code I came across is incorrect and that I need to apply a by function or similar.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Ben
  • 45
  • 7
  • see this for the answer [http://stackoverflow.com/questions/1169539/linear-regression-and-group-by-in-r] – cccmir Apr 17 '16 at 08:19

1 Answers1

0

This should work. Do what ever reg u want to do inside the function!

lapply(split(df,df$group),function(x){lm( x$Y ~ x$X )})
Chirayu Chamoli
  • 2,076
  • 1
  • 17
  • 32
  • Thanks @ChirayuChamoli, this works well for the example data here. When I applied it to my large data set I received the following error: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases Do you know why this might be or is this a question for another thread? I already checked for NA's and zero exist in the data frame, using sapply(newdf, function(x) sum(is.na(x))) – Ben Apr 17 '16 at 11:20
  • @Ben This seem to work with NA too. anyways add na.action=na.omit inside lm if working with NA. – Chirayu Chamoli Apr 17 '16 at 13:23