3

This question is related to: Selecting Percentile curves using gamlss::lms in R

I can get centile curve from following data and code:

age = sample(5:15, 500, replace=T) 
yvar = rnorm(500, age, 20)
mydata = data.frame(age, yvar)
head(mydata)
  age      yvar
1  12  13.12974
2  14 -18.97290
3  10  42.11045
4  12  27.89088
5  11  48.03861
6   5  24.68591

h = lms(yvar, age , data=mydata, n.cyc=30)
centiles(h,xvar=mydata$age, cent=c(90), points=FALSE)

enter image description here

How can I now get yvar on the curve for each of x value (5:15) which would represent 90th percentiles for data after smoothening?

I tried to read help pages and found fitted(h) and fv(h) to get fitted values for entire data. But how to get values for each age level at 90th centile curve level? Thanks for your help.

Edit: Following figure show what I need:

enter image description here

I tried following but it is correct since value are incorrect:

mydata$fitted = fitted(h)
aggregate(fitted~age, mydata, function(x) quantile(x,.9))
   age    fitted
1    5  6.459680
2    6  6.280579
3    7  6.290599
4    8  6.556999
5    9  7.048602
6   10  7.817276
7   11  8.931219
8   12 10.388048
9   13 12.138104
10  14 14.106250
11  15 16.125688

The values are very different from 90th quantile directly from data:

> aggregate(yvar~age, mydata, function(x) quantile(x,.9))
   age     yvar
1    5 39.22938
2    6 35.69294
3    7 25.40390
4    8 26.20388
5    9 29.07670
6   10 32.43151
7   11 24.96861
8   12 37.98292
9   13 28.28686
10  14 43.33678
11  15 44.46269
jmuhlenkamp
  • 2,102
  • 1
  • 14
  • 37
rnso
  • 23,686
  • 25
  • 112
  • 234
  • I'm confused by the question. You already have displayed a result that I would have said represented your requested outcome. – IRTFM Dec 16 '14 at 16:31
  • I need the numbers to make a table of 90th percentile for each age (5:15). This cannot be aggregate(yvar~age,mydata,function(x)quantile(x, 0.9)) since above curve is a smooth version of those actual percentile values. I have added a figure to show the value that I need. – rnso Dec 16 '14 at 16:42
  • After looking at the documentation in multiple places, it looks to me that one would need to hack the `centiles` function. I believe the lines that do the work are: `ll <- eval(newcall); if (plot) { lines(oxvar, ll, col = col.centiles[ii], lty = lty.centiles[ii], lwd = lwd.centiles[ii], ...)`. So you would need to return the values of `ll` and either the `oxvar` values or recover the ordered xvar values. – IRTFM Dec 16 '14 at 17:03
  • I was trying a solution which I have added to my question above. Will that be correct? – rnso Dec 16 '14 at 17:08
  • I'd say it would not be your desired result. The `fitted` value of the `h`-object is the local mean. I wouldn't expect it to reflect the sampling variation. and so only if you also recovered the estimate for `sigma` could you construct an upper quantile estimate. – IRTFM Dec 16 '14 at 17:23
  • Yes, I saw they are all means. There are many sigma vectors in h (see output of command: str(h) ), including "sigma.fv". Which one should I use and how? – rnso Dec 16 '14 at 17:48

1 Answers1

3

See if this makes sense. The 90th percentile of a normal distribution with mean and sd of 'smn' and 'ssd' is qnorm(.9, smn, ssd): So this seems to deliver (somewhat) sensible results, albeit not the full hack of centiles that I suggested:

 plot(h$xvar, qnorm(.9, fitted(h), h$sigma.fv))

(Note the massive overplotting from only a few distinct xvars but 500 points. Ande you may want to set the ylim so that the full range can be appreciated.)

enter image description here

The caveat here is that you need to check the other parts of the model to see if it is really just an ordinary Normal model. In this case it seems to be:

> h$mu.formula
y ~ pb(x)
<environment: 0x10275cfb8>
> h$sigma.formula
~1
<environment: 0x10275cfb8>
> h$nu.formula
NULL
> h$tau.formula
NULL

So the model is just mean-estimate with a fixed-variance (the ~1) across the range of the xvar, and there are no complications from higher order parameters like a Box-Cox model. (And I'm unable to explain why this is not the same as the plotted centiles. For that you probably need to correspond with the package authors.)

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks for your answer. I have to create charts (similar to growth charts) for children of ages 5:15 for a health variable which is non-negative, continuous and in the range of 50-150 (with a few values outside this range). I have to create 90th, 95th and 99th percentile curves and also create tables for these percentiles. Is gamlss used as above the best method or should I use some other method in R? The sample size is about 8000. – rnso Dec 17 '14 at 01:39
  • I think you have now crossed over into the "please give me statistical advice"-territory which is better addressed on CrossValidated.com or as I suggested above to the package authors. You are welcome to cite this exchange. I'd be interested in what the real statisticians have up their sleeves. – IRTFM Dec 17 '14 at 02:04
  • Thank you, Sir. I appreciate your time and help. I will post this at crossvalidated and let you know their comments. – rnso Dec 17 '14 at 02:23