2

I'm hoping to get some help with presenting regression outputs for my Masters thesis. I am assessing the impacts of elephants on woody vegetation, particularly in relation to artificial waterholes. In addition to generally declining with distance from waterholes, the impacts differ substantially between the two vegetation types involved.

I've figured out what seems to me a satisfactory way to of plotting this using visreg. In the model output shown below, both distance to waterhole and veg type explained damage, hence my attempt to show both. However, the issue is that I only have samples at the furthest distances for waterholes (x-axis) from the red vegetation type. As you can see, the regression line for the blue veg type is extending beyond the last points for this vegetation type. Is there anyway I can get the blue line to stop at a smaller distance from the waterhole (x axis value) than for the red to avoid this?

See code for the model and plot below the visreg plot.

Damage to vegetation (y axis) against distance to waterhole (currently scaled, hence the funny units). The two colours correspond to the two vegetation types

Sample data and code

> dput(vegdata[21:52, c(4,7,33)])
structure(list(distance = c(207L, 202L, 501L, 502L, 1001L, 1004L, 
2010L, 1997L, 4003L, 3998L, 202L, 194L, 499L, 494L, 1004L, 1000L, 
2008L, 1993L, 4008L, 3998L, 493L, 992L, 1941L, 2525L, 485L, 978L, 
1941L, 3024L, 495L, 978L, 1977L, 2952L), vegtype = structure(c(1L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label =  c("teak", 
"term"), class = "factor"), toedl = c(35.48031025, 47.30482718, 
25.16709533, 22.29360164, 17.6546533, 12.81605101, 20.34136734, 
18.45809334, 11.3578081, 3.490830751, 60.54870317, 44.9863128, 
18.81010698, 20.4777188, 30.36994386, 18.7417214, 21.52247156, 
18.29685939, 30.26217664, 8.945486104, 43.95749178, 43.54799495, 
44.42693993, 50.06207783, 48.05538594, 35.31220933, 52.37339094, 
40.51569938, 41.45677007, 58.86629306, 37.80203313, 46.35633342
)), row.names = 21:52, class = "data.frame")

m1<-lm(toedl~vegtype+distance, data=vegdata)
summary(m1)

library(visreg)
visreg(oedl6, 'sexactd', by='vegtype',overlay=TRUE, gg=TRUE,    points=list(size=2.5), ylab='% old elephant damage', xlab='distance from waterhole')
  • 3
    Can you make this [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), including a sample of data and the packages used? – camille Dec 12 '19 at 20:27
  • 1
    Done, hope I've done it correctly. – Luke Wilson Dec 12 '19 at 21:02
  • 1
    Making it reproducible means providing sample data. – akash87 Dec 12 '19 at 21:08
  • Note that you could add noise to the actual study data and post that, so long as it would reproduce the problem. – James Phillips Dec 12 '19 at 21:20
  • I don't know visreg, and we don't have any reproducible data, but something like `ggplot(your data, aes(your x, your y, color = vegtype)) + geom_point() + geom_smooth(method = 'lm')` should work – tjebo Dec 12 '19 at 21:45
  • Sorry everyone I had added the code to make it reproducible but something must have gone wrong when I tried to save edits. It should be okay now finally. – Luke Wilson Dec 13 '19 at 06:46

2 Answers2

3

Regarding the comments about a reproducible example, you can just make a small dataframe with representative data like below, also a general comment that you should avoid naming your variables names of base functions like 'all'.

I'm not sure whether it's possible to use visreg to do what you want, but you can extract the information from your model using predict, then use ggplot to plot it, which may be preferable because ggplot is really good for customizing plots.

library(ggplot2)
library(visreg)

# Create reproducible data example
allData <- data.frame(vegtype = rep(c("t1", "t2"), each = 10),
                      oedl = c(seq(from = 35, to = 20, length.out = 10),
                               seq(from = 20, to = 5, length.out = 10)),
                      sexactd = c(seq(from = -1, to = 1, length.out = 10),
                                  seq(from = -1, to = 2, length.out = 10)))

# Make linear model
oedl6 <- lm(formula = oedl ~ sexactd + vegtype, data = allData)

# Predict the data using the linear model
odelPred <- cbind(allData, predict(oedl6, interval = 'confidence'))

ggplot(odelPred, aes(sexactd, oedl, color = vegtype, fill = vegtype)) +
  geom_point() + geom_line(aes(sexactd, fit)) + 
  geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.3)
mikeHoncho
  • 317
  • 2
  • 11
  • Great, I've tried that and it is cutting the line for the 'term' veg type at the correct point now! Next step will be to figure out how to does this for GAMs, as curvature is evident in some of the models. – Luke Wilson Dec 13 '19 at 08:45
  • 1
    Should be do-able using the same strategy using mgcv gam models. Good luck! https://stats.stackexchange.com/questions/33327/confidence-interval-for-gam-model/33328 – mikeHoncho Dec 13 '19 at 20:23
  • Sorry to bother again, but I'm getting the error: Error in FUN(X[[i]], ...) : object 'fit' not found when I try use this method for visualizing a model with an interaction between vegtype and distance. Still working great on models without an interaction, but do you know any trick for getting this method to work where there are different slopes due to the interaction term? – Luke Wilson Dec 14 '19 at 20:02
  • Are you trying to add the interaction term to the linear models with something like `oedl6 <- lm(formula = oedl ~ sexactd + vegtype + sexactd:vegtype, data = allData)`. Or to the gam models? – mikeHoncho Dec 14 '19 at 20:14
  • Sorry my mistake, did some testing and turns out the issue wasn't the interaction but rather the fact that the model in question was a gamma GLM. I believe I would need to back transform the coefficients in this case? – Luke Wilson Dec 14 '19 at 20:28
1

MR Macarthurs solution is great, and (s)he deserved the accepted answer. Visualising a multiple regression model with several predictors in a 2 dimensional graph is... difficult. Basically, you are limited to one predictor. And can add the interaction (in your case: vegtype). One can simply use geom_smooth for it.

Using your data:

library(tidyverse)
ggplot(vegdata, aes(toedl, distance, color = vegtype)) + 
  geom_point() +
  geom_smooth(method = 'lm')

Created on 2019-12-13 by the reprex package (v0.3.0)

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • Thanks, MR Macarthur's solution is working great, although I've encountered difficulty using it for a model where there is an interaction. I'll see if (s)he can suggest any round this but if not, using the 'lm' function in ggplot may be my best bet. – Luke Wilson Dec 14 '19 at 20:03