nls regression and storing output coefficients and plots

Question

I tried to make reproducible data as suggested by How to make a great R reproducible example?:

structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ANK.26.1", 
"ANK.35.10"), class = "factor"), DAY = c(2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 
18L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 
15L, 16L, 17L, 18L), carbon = c(1684.351094778, 3514.451339358, 
6635.877888654, 10301.700591252, 11361.360992769, 11891.934331254, 
12772.885869486, 13545.127224369, 14022.00520767, 14255.045990397, 
14479.813468278, 14611.749542181, 14746.382638335, 14942.733363567, 
14961.338739162, 15049.433738817, 15047.197961499, 1705.361701104, 
3293.593040601, 4788.872254899, 6025.622715999, 6670.80499518, 
7150.526272512, 7268.955557607, 7513.61998338, 7896.202773246, 
8017.953574608, 8146.09464786, 8286.148260324, 8251.229520243, 
8384.244997158, 8413.034235219, 8461.066691601, 8269.360979031
), g.rate.perc = c(NA, 1.08653133557123, 0.888168948119852,0.55242467750436, 
0.102862667394628, 0.0466998046116733, 0.0740797513417739, 0.060459426536321, 
0.0352066079115925, 0.0166196474238596, 0.0157675729725753, 0.00911172469120847, 
0.00921402983026387, 0.0133151790542558, 0.00124511193115184, 
0.00588817626489591, -0.000148562222127446, NA, 0.931316411333049, 
0.45399634862756, 0.258255053647507, 0.107073129133681, 0.0719135513148148, 
0.0165623173150578, 0.0336588143694119, 0.0509185706373581,0.0154189051191185, 
0.0159817679236518, 0.0171927308137518, -0.00421410998016991, 
0.0161206856006937, 0.00343373053515927, 0.00570929049366353, 
-0.0226573929218994), max.carb = c(15049.433738817, 15049.433738817, 
15049.433738817, 15049.433738817, 15049.433738817, 15049.433738817, 
15049.433738817, 15049.433738817, 15049.433738817, 15049.433738817, 
15049.433738817, 15049.433738817, 15049.433738817, 15049.433738817, 
15049.433738817, 15049.433738817, 15049.433738817, 8461.066691601, 
8461.066691601, 8461.066691601, 8461.066691601, 8461.066691601, 
8461.066691601, 8461.066691601, 8461.066691601, 8461.066691601, 
8461.066691601, 8461.066691601, 8461.066691601, 8461.066691601, 
8461.066691601, 8461.066691601, 8461.066691601, 8461.066691601
)), .Names = c("ID", "DAY", "carbon", "g.rate.perc", "max.carb"
), row.names = c(NA, 34L), class = "data.frame")

'data.frame':   34 obs. of  5 variables:
 $ ID         : Factor w/ 150 levels "ANK.26.1","ANK.35.10",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ DAY        : int  2 3 4 5 6 7 8 9 10 11 ...
 $ carbon     : num  1684 3514 6636 10302 11361 ...
 $ g.rate.perc: num  NA 1.087 0.888 0.552 0.103 ...
 $ max.carb   : num  15049 15049 15049 15049 15049 ...

In the sample data ID only has two levels not the indicated 150.

My nls looks like that:

res.sample <- ddply (
  d, .(ID),
  function(x){
  mod <- nls(carbon~phi1/(1+exp(-(phi2 + phi3 * DAY))),
         start=list(
           phi1 = x$max.carb,
           phi2 = int[1], 
           phi3 = mean(x$g.rate.perc)),
         data=x,trace=TRUE)
 return(coef(mod))
}
)

phi2 is actually the result for the intercept from

 int <- coef(lm(DAY~carbon,data=sample))

Unfortunately it doesnt work anymore, since i tried to wrap it into a ddply surrounding, but i cant go through all original 150 levels of ID manually.

On top of that, i would like to store all three output values for phi1 - phi3 in a data frame/ list with the respective ID. I meant to do that by

return(coef(mod))

The cherry on top would be the plots of the actual data and the fitted curve on top. Manually with subsetting i can do this as well but it is just too time consuming. My reduced ggplot code for that is

ggplot(data=n, aes(x = DAY, y = carbon))+ 
 geom_point(stat="identity", size=2) +
 geom_line( aes(DAY,predict(logMod) ))+
 ggtitle("ID")

if somehow the ID which contains a triplet of information, is less useful, here is how you can return it into another version

sep_sample <- sample %>% separate(ID, c("algae", "id", "nutrient"))

I feel like this is so much to ask but i have really tried and i can only spend so many days on this.

Here is a summary:

I need to run the model on every level of ID / every combination of algae & nutrient if you split it.

The output phi's should be stored in some kind of frame/list/table with their respective identification of where they belong to.

Ideally there is a way to include ggplots in all of this, which are automatically generated as well and stored.

as i said, the model in itself already worked, but when i put in in the ddply structure i get the following error message:

Error in numericDeriv(form[[3L]], names(ind), env) : 
  Missing value or an infinity produced when evaluating the model

I hope this is something you can work with somehow and this appears as a reasonable question to ask. If there are pages that already provide a similar solution that I havent found I will gladly take a look.

Thanks a lot!

Really, it's too difficult to paste the output of `dput(yourdataframe)` into the question? "it is starting to take up too much time" is not a valid excuse. — Roland, Dec 09 '16 at 14:53
I see my mistake now, i used dput(head(df)) instead of just dput, so I am sorry but for people who dont really work with code regularly, things are not as obvious sometimes, especially when you keep taking in a lot of information over a short period of time. I will adjust my question now and I am sorry for this apparent capital offense — i.love.broccoli, Dec 09 '16 at 15:17

i.love.broccoli · Answer 1 · 2016-12-09T16:13:05.963

soo i came up with this solution, which is not all i wanted, but i think a step closer, since it is running

coef_list <- list()
curve_list <- list()
for(i in levels(d$ALGAE)) {
for(j in levels(d$NUTRIENT)) {
dat = d[d$ALGAE == i & d$NUTRIENT == j,]

#int <- coef(lm(DAY~carbon,data=dat))

mod <- nls(carbonlog~phi1/(1+exp(-(phi2+phi3*DAY))),
           start=list(
             phi1=9.364,
             phi2=0,
             phi3= 0.135113),
           data=dat,trace=TRUE)
coef_list[[paste(i, j, sep = "_")]] = coef(mod)

plt <- ggplot(data = dat, aes(x = DAY, y = carbonlog)) + geom_point()+
  geom_line( aes(DAY,predict(mod) ))+
  ggtitle(paste(i,"RATIO",j,sep=" ")) + 
  theme.plot
curve_list[[paste(i, j, sep = "_")]] = plt
  }
}

unfortunately the paramters are static and not dependend on the respective factor combination. i reckon that the letter would be more helpful finding a fit.

if i apply

curve_list[["ANK_1"]]

i get an error message though:

Error: Aesthetics must be either length 1 or the same as the data (17): x, y

i only get the message when i use the log transformed carbon values. when i use carbon in its original format it plots everything

nls regression and storing output coefficients and plots

1 Answers1