2

I am fitting SMA to allometric data using the smatr packing in R, and I am having difficulty plotting the 95% confidence intervals calculated by the sma() command.

Taking the example data in the package documentation, how would I add the upper and lower 95% confidence lines to the plot of xy data and SMA fit?

# Load leaf lifetime dataset:
data(leaflife)

# Fit SMA
ft <- sma(longev~lma, data=leaflife, log="xy", method="SMA")

#plot data and fit
plot(ft, log="xy")

enter image description here

How do I now add lines for the 95% confidence intervals to my plot?

Thank you!

user2100721
  • 3,557
  • 2
  • 20
  • 29
Katie T
  • 21
  • 2
  • Possible duplicate of [Plotting Confidence Intervals](http://stackoverflow.com/questions/14069629/plotting-confidence-intervals) – user2100721 Sep 12 '16 at 15:26
  • I have looked at this, but have been unable to use this regression example with my SMA analysis and produce confidence intervals. Help specific to this SMA example would be wonderful! – Katie T Sep 14 '16 at 15:09

1 Answers1

3

I had this problem recently and the smatr package does not return confidence intervals around the predictions.

However, these could be achieved through bootstrapping, where random datasets are made over and over the same model fit to them. Extracting all of the intercepts and slopes allow you to get 95% confidence intervals around your predictions.

There are probably many ways to do this, but I have done it using the tidyverse. This uses the packages model, purrr, dplyr etc. Many more lines of code than I would prefer so if anyone has more elegant solutions let me know.

I log10 transform prior to the running the model as it makes it easier to plot the points and predictions in ggplot2.

# smatr bootstrap example

# load packages
library(smatr)
library(dplyr)
library(purrr)
library(tidyr)
library(ggplot2)

# load data
data(leaflife)

# make columns log scale
leaflife <- mutate(leaflife, log_longev = log10(longev),
                   log_lma = log10(lma))

# fit sma
mod <- sma(log_longev ~ log_lma, data=leaflife, method="SMA")

# plot model
plot(mod)

# create new data set of log_lma at a high resolution (200 points from min to max)
preds <- data.frame(expand.grid(log_lma = seq(min(leaflife$log_lma, na.rm = T), max(leaflife$log_lma, na.rm = T), length.out = 200), stringsAsFactors = FALSE))

# bootstrap data and get predictions
preds <- leaflife %>%
  # create new bootstrapped data sets
  modelr::bootstrap(n = 1000, id = 'boot_num') %>%
  # fit sma to every bootstrap
  group_by(boot_num) %>%
  mutate(., fit = map(strap, ~ sma(log_longev ~ log_lma, data=data.frame(.), method="SMA"))) %>%
  ungroup() %>%
  # extract intercept and slope from each fit
  mutate(., intercept = map_dbl(fit, ~coef(.x)[1]),
         slope = map_dbl(fit, ~coef(.x)[2])) %>%
  select(., -fit) %>%
  # get fitted values for each bootstrapped model
  # uses the preds dataframe we made earlier
  group_by(boot_num) %>%
  do(data.frame(fitted = .$intercept + .$slope*preds$log_lma, 
                log_lma = preds$log_lma)) %>%
  ungroup() %>%
  # calculate the 2.5% and 97.5% quantiles at each log_lma value
  group_by(., log_lma) %>%
  dplyr::summarise(., conf_low = quantile(fitted, 0.025),
                   conf_high = quantile(fitted, 0.975)) %>%
  ungroup() %>%
  # add fitted value of actual unbootstrapped model
  mutate(., log_longev = coef(mod)[1] + coef(mod)[2]*log_lma)

# plot with ggplot
ggplot(leaflife, aes(log_lma, log_longev)) +
  geom_point() +
  geom_line(data = preds) +
  geom_ribbon(aes(ymin = conf_low, ymax = conf_high), alpha = 0.1, preds) +
  theme_bw()

This then gives this plot

enter image description here