R GAM visualisation, geom_smooth not fit to all observed data

Question

I've made a GAM model in R using the following code:

mod_gam1 <-gam(y ~ s(ï..x), data=Bird.data, method = "REML")
plot(mod_gam1)
coef(mod_gam1)
plot(mod_gam1, residuals = TRUE, pch = 1)
coef(mod_gam1)

mod_gam1$fitted.values

result <- data.frame(data = c(mod_gam1$fitted.values, Bird.data$y), Year = rep(1991:2019, times = 2), 
                     'source' = c(rep('Modelled', times = 29), rep('Observed', times = 29)))
ggplot(result, aes(x = Year, y = data, colour = source))+ geom_point()+ geom_smooth(span= 0.8)+labs(x="Year", y = "Bird Island Total Debris Count")+ scale_y_continuous(limits = c(0,1000))

and the output looks ok but the shaded area of the geom_smooth error doesn't extend to the whole of my dataset (stops short of my first two datapoints) and I am not sure why.

Any help would be appreciated!

I can't upload a picture as I am new to the site, but yeah basically I have two datasets (observed and GAM modelled values) which both have their SE confidence ribbon, but these start two datapoints in to my datasets not at the first points.

These are my datapoints: Bird.data

ï..x	y
1991	17
1992	76
1993	328
1994	131
1995	425
1996	892
1997	501
1998	419
1999	297
2000	277
2001	310
2002	282
2003	189
2004	278
2005	322
2006	444
2007	412
2008	241
2009	242
2010	255
2011	289
2012	335
2013	279
2014	628
2015	500
2016	174
2017	636
2018	420
2019	447

Fitted Values

 [1]  95.56189 177.01468 255.17074 324.97532 380.28813 415.71334 428.67793 420.86624 398.18522 369.06325
[11] 341.72715 321.65585 310.33971 305.81158 304.53360 303.60521 302.21413 301.75501 304.77184 313.43400
[21] 328.37279 348.39076 371.04203 393.66222 414.29754 432.15104 447.48020 461.14595 474.09266

Negative Binomial

interestingly when I increased the geom_smooth span from 0.8 to 1.1 both confidence ribbons moved to include both of my first two datapoints, however now the gam has been smoothed quite alot - too much. — Meghan F, Feb 04 '22 at 18:34
[See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with. Data shouldn't be in a picture anyway, so it's okay that you can't upload images. Without your data we can't run any of the code or see the chart you're trying to change — camille, Feb 04 '22 at 19:08

score 1 · Accepted Answer · answered Feb 05 '22 at 17:28

It is because of the limits you have put using scale_y_continuous. If you remove that line (or adjust the y down, so that it allows the minimum y value of the smooth, then you will see the smooth fill completely.

However, you have a larger problem here. You are not actually showing the gam model in the smooth (only the gam point predictions). There are a couple of ways to do this.. Easiest might be to feed Bird.data directly to the ggplot function, and use the method and formula params of the geom_smooth() to directly request the gam smooth:

ggplot(Bird.data, aes(x,y)) + 
  geom_point() + 
  geom_smooth(method="gam", formula=y~s(x)) +
  labs(x="Year", y = "Bird Island Total Debris Count")

The problem with this approach is that you don't get the prediction points as well. This can be fixed with the following approach

add the se directly to the result dataframe

result$se = c(predict(mod_gam1,se=T)$se, rep(NA,29))

use ggplot as before, but use geom_ribbon, setting the ymin and ymax directly

ggplot(result, aes(x = Year, y = data, colour = source, fill=source))+
  geom_point()+ 
  geom_ribbon(aes(ymin=data-1.96*se, ymax=data+1.96*se), alpha=0.2) +
  labs(x="Year", y = "Bird Island Total Debris Count")+
  scale_y_continuous(limits = c(-200,1000))

Thank you so much! That is perfect, so helpful!!! And I definitely learnt another thing or two about R (which I am very new to I'm sure you could tell!). — Meghan F, Feb 06 '22 at 09:58
@MeghanF, you can and should upvote/accept an answer if it solves your issue. — NelsonGon, Feb 06 '22 at 11:27
Just to add a further question, I tried doing a negative binomial gam with the same code (just adjusted the code when adding the data with family=nb) and it worked fine but just looking at the graph the error shade for the model seems much lower than the first gaussian model (but I suspect this is just because the model has a better fit?). I just wanted to double check the graph is ok! I embedded the image into the original question. — Meghan F, Feb 07 '22 at 16:35

R GAM visualisation, geom_smooth not fit to all observed data

1 Answers1