tl;dr: This type of modeling is sensitive to the starting values. I detail how to successfully fix the issue. The three main things to aid in this fix are:
- Increase maximum iteration and function calls
- Acquire informed starting values
- Potentially restrict/subset the data as countries initially added have less information available about them
Briefly in overview:
I started tinkering by just blindly changing starting values. Changing c(1000, 20, 20)
to c(20000, 10, 10)
I got the model to run error/warning free as baseModel.naive
with a Log-likelihood: -23451.37
. Using increased maxima for iteration and function calls and acquiring informed starting values enabled the model to improve substantially with baseModel.bellcurve
reporting a Log-likelihood: -20487.86
.
Naively change starting values will get rid of errors
I adjusted the starting values. Also, I showed how to increase the maximum number of functional calls and iterations and use the verbose option. More can be found using ?nlme
and ?nlmeControl
. In my experience, these types of models can be sensitive to starting values and maximum iterations and function calls.
baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat,
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(20000, 10, 10)),
na.action = na.omit,
control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
baseModel.naive
Output:
> baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
+ data = dat,
+ fixed = list(d ~ 1, mu ~ 1, var ~ 1),
+ random = d + mu + var ~ 1|Country.Region,
+ start = list(fixed = c(20000, 10, 10)),
+ na.action = na.omit,
+ control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
0: 30455.774: 2.23045 10.6880 8.80935 -11177.6 3277.46 -1877.96
1: 30450.763: 2.80826 11.2726 9.37889 -11177.6 3277.46 -1877.96
2: 30449.789: 2.94771 11.5283 9.80691 -11177.6 3277.46 -1877.96
3: 30449.150: 3.30454 11.8277 10.0329 -11177.6 3277.46 -1877.96
4: 30448.727: 3.64209 12.2237 10.4571 -11177.6 3277.46 -1877.96
5: 30448.540: 3.97875 12.5637 10.8217 -11177.6 3277.46 -1877.96
6: 30448.445: 4.31754 12.9055 11.1826 -11177.6 3277.46 -1877.96
7: 30448.397: 4.65727 13.2480 11.5420 -11177.6 3277.46 -1877.96
8: 30448.372: 4.99746 13.5908 11.9008 -11177.6 3277.46 -1877.96
9: 30448.360: 5.33789 13.9337 12.2591 -11177.6 3277.46 -1877.96
10: 30448.354: 5.67844 14.2767 12.6173 -11177.6 3277.46 -1877.96
11: 30448.351: 6.01905 14.6197 12.9753 -11177.6 3277.46 -1877.96
12: 30448.349: 6.35969 14.9627 13.3333 -11177.6 3277.46 -1877.96
13: 30448.349: 6.70035 15.3058 13.6913 -11177.6 3277.46 -1877.96
14: 30448.348: 7.04102 15.6489 14.0493 -11177.6 3277.46 -1877.96
15: 30448.348: 7.38170 15.9919 14.4073 -11177.6 3277.46 -1877.96
16: 30448.348: 7.72237 16.3350 14.7652 -11177.6 3277.46 -1877.96
17: 30448.348: 8.06305 16.6781 15.1232 -11177.6 3277.46 -1877.96
18: 30448.348: 8.40373 17.0211 15.4811 -11177.6 3277.46 -1877.96
19: 30448.348: 8.74441 17.3642 15.8391 -11177.6 3277.46 -1877.96
20: 30448.348: 9.08508 17.7073 16.1971 -11177.6 3277.46 -1877.96
21: 30448.348: 9.42576 18.0503 16.5550 -11177.6 3277.46 -1877.96
0: 30108.384: 9.42573 18.0503 16.5550 -11183.6 3279.09 -1879.43
1: 30108.384: 9.42568 18.0503 16.5550 -11183.6 3279.09 -1879.43
2: 30108.384: 9.42544 18.0502 16.5548 -11183.6 3279.09 -1879.43
0: 30108.056: 9.42539 18.0502 16.5548 -11168.0 3239.13 -1879.51
1: 30108.056: 9.42526 18.0502 16.5549 -11168.0 3239.13 -1879.51
0: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
1: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
> baseModel.naive
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat
Log-likelihood: -23451.37
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
4290.47415 35.32178 38.44048
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 1.402353e-01 d mu
mu 2.518250e-05 0
var 1.123208e-04 0 0
Residual 1.738523e+03
Number of Observations: 2641
Number of Groups: 126
But then maybe that isn't enough. See the Corr
with the 0's? Something seems off. So I took a page from Roland ( https://stackoverflow.com/a/27549369/2727349) and decided to get a self starting function that was similar to your bellcurve.model
.
I also tried subsetting the data to Country.Region
s that have a minimum number of days in case that was an issue. I am detailing my results/code here in sections and at the end (in the Appendix) it is all together for a quick copy and paste.
Preliminaries & Data exploration
To explore things, I tried limiting the data to countries that had a minimum number of days. I chose 45 days (5 countries) and slowly increased it up to 1 day (full dataset). I used data.table
to calculate and display pertinent things.
library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
dim(dat)
head(dat)
## how many countries?
dat[,uniqueN(Country.Region)]
## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]
minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))
Get automatic starting values from nls() and the data
This is where Roland's answer in a related post comes into play. We need an automated way to get informed starting values, and one way of doing that is using nls()
and a self starting function. You can make a self starting function out of your bellcurve.model
but I found SSbell
that was similar and decided to make use of it. I run nls()
with SSbell
and get starting values for the SSbell
formulation which has ymax
(corresponds to d
of bellcurve.model
) and xc
(corresponds to mu
in bellcurve.model
). For the var
starting value in bellcurve.model
I first calculate each and every Country.Region
's variance and then selected one of the smaller ones, settling on the 20%-tile because that worked for minimum_days<-45
and minimum_days<-1
(full data).
## use this approach https://stackoverflow.com/a/27549369/2727349
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)
## calculate a starting value for var: the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))
##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start
Again -- I had to play around a bit -- but if you take the 20%-tile of empirical variances the nlme
call for bellcurve.model
the code will run for minimum_days <- 45
as well as minimum_days <- 1
(full dataset). With our starting values calculated and potentially our dataset restricted, we fit two nlme
models: one using SSbell
and the other bellcurve.model
. Both will run for minimum_days<-45
and only bellcurve.model
will converge for minimum_days<-1
(full dataset). Lucky!
Two nlme calls: one with SSbell
, the other for bellcurve.model
# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
data = dat[N>=minimum_days],
fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
random = ymax + a + b + xc ~ 1|Country.Region,
start = list(fixed = c( coef(nls_starting_values) )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc for mu;
# NLME Model: using bellcurve.model and var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat[N>=minimum_days],
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
na.action = na.omit,
control=list(maxIter=1e6,
msMaxIter = 1e6,
msVerbose = TRUE)
)
Compare the output
baseModel.ssbell
baseModel.bellcurve
Showing the output for minimum_days<-45
shows similar fits (look at likelihood).
## 5 countries DATA minimum_days <- 45
> baseModel.ssbell
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ SSbell(day, ymax, a, b, xc)
Data: dat[N >= minimum_days]
Log-likelihood: -2264.706
Fixed: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
ymax a b xc
1.026599e+03 -1.164625e-02 -1.626079e-04 1.288924e+01
Random effects:
Formula: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
ymax 1.731582e+03 ymax a b
a 4.467475e-06 -0.999
b 1.016493e-04 -0.998 0.999
xc 3.528238e+00 1.000 -0.999 -0.999
Residual 8.045770e+02
Number of Observations: 278
Number of Groups: 5
## 5 countries DATA minimum_days <- 45
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat[N >= minimum_days]
Log-likelihood: -2267.406
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
965.81525 12.73549 58.22049
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 1.633432e+03 d mu
mu 2.815230e+00 1.00
var 5.582379e-03 -0.01 -0.01
Residual 8.127932e+02
Number of Observations: 278
Number of Groups: 5
Showing output for baseModel.bellcurve
for minimum_days<-1
(full dataset) shows an improvement from baseModel.naive
where I blindly and arbitrarily fiddled with the starting values for the sole purpose of getting rid of the errors and warnings.
## FULL DATA minimum_days <- 1
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat[N >= minimum_days]
Log-likelihood: -20487.86
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
1044.52101 25.00288 81.79004
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 4.285702e+03 d mu
mu 5.423452e+00 0.545
var 3.008379e-03 0.028 0.050
Residual 4.985698e+02
Number of Observations: 2641
Number of Groups: 126
Log-likelihood: -20487.86
for baseModel.bellcurve
vs Log-likelihood: -23451.37
for baseModel.naive
The Corr matrix in baseModel.bellcurve
looks better, too.
Appendix: Code in one grab.
library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
dim(dat)
head(dat)
## how many countries?
dat[,uniqueN(Country.Region)]
## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]
minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))
## use this approach https://stackoverflow.com/a/27549369/2727349
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)
## calculate a starting value for var: the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))
##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start
# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
data = dat[N>=minimum_days],
fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
random = ymax + a + b + xc ~ 1|Country.Region,
start = list(fixed = c( coef(nls_starting_values) )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc for mu;
# NLME Model: using bellcurve.model and var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat[N>=minimum_days],
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
baseModel.ssbell
baseModel.bellcurve