0

I would like to extract the fitted values from my bam() model in order to be able to plot the figures in either prism or with ggplot. I use GAM to model brain responses in EEG data, the data are thus many 400ms time series of four different categories. The model contains a by-factor for these categories(uV ~ s(time, by = CatInt)). I would like to get the fitted values for these 4 categories separately. I found the function fitted() or fitted.values() but this just outputs a string with the same length as the number of data points I put into the model, while I would like one average for the data over time per category. Ideally it would include either SD, SE or CI. Is this possible?

Ekatef
  • 1,061
  • 1
  • 9
  • 12
R Kat
  • 3
  • 1
  • Welcome to SO! It seems, that you are looking for the `summary( )` function. Unfortunately, it is difficult to give a more specific answer, unless you provide the data to reproduce the problem. You may find [this guidance](https://stackoverflow.com/help/minimal-reproducible-example) helpful to formulate the questions more precisely. – Ekatef Jul 15 '19 at 13:47
  • Thanks for your fast response. It is difficult to provide data as I work with massive datasets. Only one trial would already be 200 datapoints (and I have 10.000 trials). I am familiar with the summary function, this indeed gives me information about the smooth terms of the different categories, but not the actual fitted non-linear pattern. It gives me this: – R Kat Jul 16 '19 at 06:25
  • Approximate significance of smooth terms: edf Ref.df F p-value s(Time):CatIntDev_High 44.554 46.83 251.074 < 2e-16 *** s(Time):CatIntDev_Low 38.240 41.85 106.552 < 2e-16 *** s(Time):CatIntStd_High 45.897 47.51 2400.426 < 2e-16 *** s(Time):CatIntStd_Low 44.355 46.72 664.190 < 2e-16 *** – R Kat Jul 16 '19 at 06:28
  • 1
    I would like to plot different combinations of the non-linear patterns of these four categories in the same plot. I can plot the non-linear pattern of one category with plot(m1,select =X), with select being 1-4 depending on the line you want to plot. But it doesnt allow me to plot two at the same time. That's why I want to extract the data of these non-linear patterns so that I can plot them elsewhere. – R Kat Jul 16 '19 at 06:31
  • Ok, thank you for clarification. Have a look on the answer bellow, please. – Ekatef Jul 16 '19 at 13:27
  • Probably, the detailed answer https://stackoverflow.com/a/15843897/8465924 on a similar question will be also helpful for you. – Ekatef Jul 16 '19 at 13:29
  • I'm reasonably sure the model you are fitting is wrong. You need to include the `CatInt` effect as a fixed effect (or random effect depending on the setting, though here you do want the former) because the smooths are centred at 0 and don't include the differences in means of the response for the respective levels. Your model should be: `uV ~ CatInt + s(time, by = CatInt)`, assuming that `CatInt` is a factor. – Gavin Simpson Jul 16 '19 at 15:46
  • Thanks Gavin, my model does indeed include that, and random effects and auto-correlation correction. I left that out to make it a bit shorter as I thought it was not directly relevant for my question, I'm sorry. Thanks a lot for alerting me :) – R Kat Jul 17 '19 at 06:25
  • Oh, OK; in that case, the answer below by @Ekatef will get you a partial effect plot, which may be what you want if you don't care so much (for this particular visualisation) how the groups vary in their intercepts but only how they vary about their respective intercepts. If you do want actual predictions (the smooth on the correct scale, including group means etc) then you will need to use `predict()` and provide `newdata` the covariate combinations you want to draw at (say grid of points over `time` repeated for each level of the factor), or `vis.gam()` which helps with this. – Gavin Simpson Jul 17 '19 at 19:43
  • With `predict()` and possibly `vis.gam()` you'll need to provide a dummy random effect level otherwise *mgcv* will complain about you not providing all covariates used in fitting, but then you'll need to exclude it's effect from the predictions. – Gavin Simpson Jul 17 '19 at 19:45

1 Answers1

0

If you are interested in customization of the plot.gam() output, you may keep the plot as an object.

First of all, let's generate some testing data with the built-in gamSim( ) function and construct a model:

library(mgcv)
set.seed(2) # R version 3.6.0 used
dat <- gamSim(5, n = 200, scale = 2)
b <- bam(y ~ x0 + s(x1, by = x0), data = dat)

Then you may just keep the plot object which is a list with all the data needed to generate a customized plot:

pl_data <- plot(b, pages = 1)

> str(pl_data)
List of 4
 $ :List of 11
  ..$ x      : num [1:100] 0.00711 0.01703 0.02694 0.03686 0.04678 ...
  ..$ scale  : logi TRUE
  ..$ se     : num [1:100] 3.25 3.04 2.83 2.64 2.47 ...
  ..$ raw    : num [1:200] 0.185 0.702 0.573 0.168 0.944 ...
  ..$ xlab   : chr "x1"
  ..$ ylab   : chr "s(x1,3.33):x01"
  ..$ main   : NULL
  ..$ se.mult: num 2
  ..$ xlim   : num [1:2] 0.00711 0.9889
  ..$ fit    : num [1:100, 1] -4.76 -4.64 -4.52 -4.4 -4.28 ...
  ..$ plot.me: logi TRUE
....

Finely, you may extract the pl_data components which are of interest with usual subsetting approaches, for example:

first_item_pl <- pl_data[[1]][c("x", "fit", "se")]

to extract different items of the first list or to extract the same item from all four nested lists

fit_list <- lapply(pl_data, "[[", "fit")
Ekatef
  • 1,061
  • 1
  • 9
  • 12
  • That worked! Thank you so much! A few small questions. My original dataset contained 201 datapoints per time-series but the fitted data only has 100, how does that work and could I set it back to the original number of data points? What would be the code to extract for example $fit from the 1st of the 4 lists in pl_data? – R Kat Jul 17 '19 at 06:35
  • Thank you for the feedback! I'm glad that it was helpful. I have updated the question to demonstrate extraction of different items from the list. By the way, if you feel that the answer solves your problem well, you may accept it ;) – Ekatef Jul 17 '19 at 16:10
  • As for the points number used to plotting, you may adjust it with `n` argument of the `plot.gam()` (by default `n = 100`). But I am not sure if it is a good idea to match `n` with the raw data length. – Ekatef Jul 17 '19 at 17:48
  • Thanks a lot for all your help. I managed now. I will accept the answer :) – R Kat Jul 18 '19 at 09:15