1

I need to interpolate by groups a large dataframe using the nlm function. I don't have any problems using it on a df with a single group:

#example data
df <- data.frame(var= cumsum(sort(rnorm(100, mean=20, sd=4))),
                 time= seq(from=0,to=550,length.out=100))
#create function
my_function <- function(Cini, time, theta,var){
  fy <- (theta[1]-(theta[1]- Cini)*exp((-theta[2]/100000)*(time-theta[3])))
  ssq<-sum((var-fy)^2)
  return(ssq)
}
th.start <- c(77, 148, 5)   #set starting parameters

#run nlm
my_fitt <- nlm(f=my_function, Cini=400, var = df$var,
               time=df$time, p=th.start)

Then, I tried to apply the function in a df with multiple groups using the dlply function:

#data with groups
df.2 <- data.frame(var= cumsum(sort(rnorm(300, mean=20, sd=4))),
                   time= rep(seq(from=0,to=1200,length.out=100),3),
                   groups=rep(c(1:3),each=100))
#run nlm
library(plyr)
my_fitt.2 <- dlply(df.2, .(groups),
               nlm(f=my_function, Cini=400, var  = df.2$var,time=df.2$time, p=th.start))

However I get the message: Error in fs[[i]](x, ...) : attempt to apply non-function. I also tried to remove the df.2$, obtaining Error in time - theta[3] : non-numeric argument to binary operatorin this example, and Error in f(x, ...) : object 'time.clos' not foundin my original df (time.closis one of the variables).

In addition, I thouth to use the dplyr library

library(dplyr)
df.2 %>%
  group_by(groups) %>%
  nlm(f=my_function, Cini=400, v= var,
      time=time, p=th.start)

obtaining Error in f(x, ...) : unused argument (.). What could be the problem?

Matt_4
  • 147
  • 1
  • 12
  • 1
    Your function do not work on my computer, it doesn't found `Cini` and `theta`. Please test you example by restarting your session, I suspect that your `theta` and `Cini` are already loaded on your computer but not ours. – Bastien Jul 31 '18 at 12:21
  • I realy think your function is wrong, `function(df)`, the `df`argument is never used in the function and the `Cini`, `theta`, `time` and `var` are missing. – Bastien Jul 31 '18 at 12:25
  • Sorry my bad, creating this example I didn't type the arguments inside the function. – Matt_4 Jul 31 '18 at 12:37
  • However, `theta` are the variables that I need to estimate. – Matt_4 Jul 31 '18 at 12:39

2 Answers2

2

Consider base R's by (the object-oriented wrapper to tapply) which can subset a dataframe by factor(s) and pass subsetted dataframes into a method such as your nlm call, all to return a list of objects:

run_nlm <- function(sub_df) nlm(f=my_function, Cini=400, var=sub_df$var, 
                                time=sub_df$time, p=th.start)

# LIST OF nlm OUTPUTS (EQUAL TO NUMBER OF DISTINCT df$groups)
my_fitt_list <- by(df, df$groups, run_nlm)
Parfait
  • 104,375
  • 17
  • 94
  • 125
1

I can't help much with the tidyverse environment as I'm more a base R kind of guy. I think the problem in your last call is that you're piping a group data.frame to a function that take a function object as first argument. That cannot work.

Let me propose you a base R way of doing it:

df.2 %>% 
  split(.$groups) %>% 
  lapply(function(xx) nlm(f=my_function, Cini=400, var = xx$var, time=xx$time, p=th.start))

This produce a list of length 3 (for three groups) with your three results.

Bastien
  • 3,007
  • 20
  • 38
  • I'm a base guy myself and still wonder why `by` is so underused: `split` + `lapply` = `by`! – Parfait Jul 31 '18 at 15:29
  • @Parfait, I don't use `by`, mostly because I was succeeding without it... However, it's very elegant, I'll try to use it more. Thanks for pointing this function to me. – Bastien Jul 31 '18 at 17:01