0

My question is very similar to this one here , but I still can't solve my problem and thus would like to get little bit more help to make it clear. The original dataframe "ddf" looks like:

CONC <- c(0.15,0.52,0.45,0.29,0.42,0.36,0.22,0.12,0.27,0.14)
SPP <- c(rep('A',3),rep('B',3),rep('C',4))
LENGTH <- c(390,254,380,434,478,367,267,333,444,411)
ddf <- as.data.frame(cbind(CONC,SPECIES,LENGTH))

the regression model is constructed based on Species:

model <- dlply(ddf,.(SPP), lm, formula = CONC ~ LENGTH)

the regression model works fine and returns individual models for each species.

What I am going to get is the residual and expected value of 'Length' variable in terms of each models (corresponding to different species) and I want those data could be added into my original dataset ddf as new columns. so the new dataset should looks like:

SPP  LENGTH  CONC  EXPECTED  RESIDUAL

Firstly, I use the following code to get the expected value:

model_pre <- lapply(model,function(x)predict(x,data = ddf))

I loom there might be some mistakes in the above code, but it actually works! The result comes with two columns ( predicated value and species). My first question is whether I could believe this result of above code? (Does R fully understand what I am aiming to do, getting expected value of "length" in terms of different model?)

Then i used the following code to attach those data to ddf:

ddf_new <- cbind(ddf, model_pre)

This code works fine as well. But the problem comes here. It seems like R just attach the model_pre result directly to the original dataframe, since the result of model_pre is not sorted the same as the original ddf and thus is obviously wrong(justifying by the species column in original dataframe and model_pre).

I was using resid() and similar lapply, cbind code to get residual and attach it to original ddf. Same problem comes.

Therefore, how can I attach those result correctly in terms of length by species? (please let me know if you confuse what I am trying to explain here)

Any help would be greatly appreciated!

Community
  • 1
  • 1
Chuan
  • 667
  • 8
  • 22

1 Answers1

0

There are several problems with your code, you refer to columns SPP and Conc., but columns by those names don't exist in your data frame.

Your predicted values are made on the entire dataset, not just the subset corresponding to that model (this may be intended, but seems strange with the later usage).

When you cbind a data frame to a list of data frames, does it really cbind the individual data frames?

Now to more helpful suggestions.

Why use dlply at all here? You could just fit a model with interactions that effectively fits a different regression line to each species:

fit <- lm(CONC ~ SPECIES * LENGTH, data= ddf)
fitted(fit)
predict(fit)
ddf$Pred <- fitted(fit)
ddf$Resid <- ddf$CONC - ddf$Pred

Or if there is some other reason to really use dlply and the problem is combining 2 data frame that have different ordering then either use merge or reorder the data frames to match first (see functions like ordor, sort.list, and match).

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • Hi, Greg. Thank you very much for your comments. Actually, my dataset has one more column (location,factor variable) and thats why I used dlply in my original code since I was going to perform the regression at species level and also location level and I need to extract the coefficient of regression. I think it is unnecessary to mention location for this question, so I get rid of it in the example dataset. Sorry about the confusion. For the cbind code, it really works and attach the result column or model_pre to ddf. The thing is it doesn't attach at right order, therefore I cant use them. – Chuan Dec 02 '14 at 19:10
  • your suggestion make sense to me, using CONC ~ SPECIES * LENGTH. But R returns error, contrasts can be applied only to factors with 2 or more levels. I am working on it and check is there anything wrong with my dataset. Thank you – Chuan Dec 02 '14 at 19:12
  • @ChuanTang, are you still using plyr? The error suggests that you are using a factor with only 1 factor level, use this on the whole dataset, not split by a plyr function first. – Greg Snow Dec 02 '14 at 21:11
  • It works if I apply to the whole dataset. The thing is I am not sure how can I explain the interaction term right now. The reason why I have to use plyr could be found similarly here: http://stackoverflow.com/questions/9014308/r-extract-regression-coefficients-from-multiply-regression-via-lapply-command Thank you for your help! – Chuan Dec 02 '14 at 22:08