0

I've made a GAM model in R using the following code:

mod_gam1 <-gam(y ~ s(ï..x), data=Bird.data, method = "REML")
plot(mod_gam1)
coef(mod_gam1)
plot(mod_gam1, residuals = TRUE, pch = 1)
coef(mod_gam1)

mod_gam1$fitted.values

result <- data.frame(data = c(mod_gam1$fitted.values, Bird.data$y), Year = rep(1991:2019, times = 2), 
                     'source' = c(rep('Modelled', times = 29), rep('Observed', times = 29)))
ggplot(result, aes(x = Year, y = data, colour = source))+ geom_point()+ geom_smooth(span= 0.8)+labs(x="Year", y = "Bird Island Total Debris Count")+ scale_y_continuous(limits = c(0,1000))

and the output looks ok but the shaded area of the geom_smooth error doesn't extend to the whole of my dataset (stops short of my first two datapoints) and I am not sure why.

Any help would be appreciated!

I can't upload a picture as I am new to the site, but yeah basically I have two datasets (observed and GAM modelled values) which both have their SE confidence ribbon, but these start two datapoints in to my datasets not at the first points.

These are my datapoints: Bird.data

ï..x y
1991 17
1992 76
1993 328
1994 131
1995 425
1996 892
1997 501
1998 419
1999 297
2000 277
2001 310
2002 282
2003 189
2004 278
2005 322
2006 444
2007 412
2008 241
2009 242
2010 255
2011 289
2012 335
2013 279
2014 628
2015 500
2016 174
2017 636
2018 420
2019 447

Fitted Values

 [1]  95.56189 177.01468 255.17074 324.97532 380.28813 415.71334 428.67793 420.86624 398.18522 369.06325
[11] 341.72715 321.65585 310.33971 305.81158 304.53360 303.60521 302.21413 301.75501 304.77184 313.43400
[21] 328.37279 348.39076 371.04203 393.66222 414.29754 432.15104 447.48020 461.14595 474.09266

Negative Binomial

enter image description here

Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
Meghan F
  • 35
  • 4
  • interestingly when I increased the geom_smooth span from 0.8 to 1.1 both confidence ribbons moved to include both of my first two datapoints, however now the gam has been smoothed quite alot - too much. – Meghan F Feb 04 '22 at 18:34
  • [See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with. Data shouldn't be in a picture anyway, so it's okay that you can't upload images. Without your data we can't run any of the code or see the chart you're trying to change – camille Feb 04 '22 at 19:08

1 Answers1

1

It is because of the limits you have put using scale_y_continuous. If you remove that line (or adjust the y down, so that it allows the minimum y value of the smooth, then you will see the smooth fill completely.

However, you have a larger problem here. You are not actually showing the gam model in the smooth (only the gam point predictions). There are a couple of ways to do this.. Easiest might be to feed Bird.data directly to the ggplot function, and use the method and formula params of the geom_smooth() to directly request the gam smooth:

ggplot(Bird.data, aes(x,y)) + 
  geom_point() + 
  geom_smooth(method="gam", formula=y~s(x)) +
  labs(x="Year", y = "Bird Island Total Debris Count")

The problem with this approach is that you don't get the prediction points as well. This can be fixed with the following approach

  1. add the se directly to the result dataframe
result$se = c(predict(mod_gam1,se=T)$se, rep(NA,29))
  1. use ggplot as before, but use geom_ribbon, setting the ymin and ymax directly
ggplot(result, aes(x = Year, y = data, colour = source, fill=source))+
  geom_point()+ 
  geom_ribbon(aes(ymin=data-1.96*se, ymax=data+1.96*se), alpha=0.2) +
  labs(x="Year", y = "Bird Island Total Debris Count")+
  scale_y_continuous(limits = c(-200,1000))

enter image description here

langtang
  • 22,248
  • 1
  • 12
  • 27
  • Thank you so much! That is perfect, so helpful!!! And I definitely learnt another thing or two about R (which I am very new to I'm sure you could tell!). – Meghan F Feb 06 '22 at 09:58
  • 1
    @MeghanF, you can and should upvote/accept an answer if it solves your issue. – NelsonGon Feb 06 '22 at 11:27
  • Just to add a further question, I tried doing a negative binomial gam with the same code (just adjusted the code when adding the data with family=nb) and it worked fine but just looking at the graph the error shade for the model seems much lower than the first gaussian model (but I suspect this is just because the model has a better fit?). I just wanted to double check the graph is ok! I embedded the image into the original question. – Meghan F Feb 07 '22 at 16:35