ggplot2: scatterplot with two variables (measured on the same scale) on the y-axis: how do I change the aesthetics & add seperate regression lines?

Question

For my thesis, I am making scatterplots in APA format in R. So far, my code is as follows, and it works great for plotting just one variable with confidence interval and regression line:

  scatterplot=ggplot(dat, aes(x=STAIT, y=valence))+
    geom_point()+
    geom_smooth(method=lm,se=T, fullrange=T,colour='black')+
    labs(x='STAI-T score', y='Report length')+
    apatheme

However, I have two variables that were initially measured on the same 0-100 scale: valence and arousal. Instead of two seperate plots, I thought it would be nice to add both variables in a single plot, using 'valence/arousal score' as the ylab and open/closed dots to define which data points come from which variable, a bit like in this example I found online. In that example, however, the data comes from different groups. So that code doesn't work on my data. I've tried different things, and the closest I get, is with the following code:

sp.both=ggplot(dat, aes(x=STAIT))+
  geom_point(aes(y=valence)) +
  geom_point(aes(y=arousal)) +
  apatheme

This gives me a scatterplot with data points of both of the variables added in the same plot. However, I need the data points of one score to be visually different from the other, and I want to add two seperate regression lines for each variable. But everything I've tried so far, has resulted in errors, and I cannot find any examples online of people trying to do the same thing.

Any help would be highly appreciated!

Welcome to SO! To help us to help you could you please make your issue reproducible by sharing a sample of your **data**? See [how to make a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Simply type `dput(NAME_OF_DATASET)` into the console and copy & paste the output starting with `structure(....` into your post. If your dataset has a lot of observations you could do `dput(head(NAME_OF_DATASET, 20))` for the first twenty rows of data. — stefan, Jan 09 '21 at 12:25

score 0 · Accepted Answer · answered Jan 09 '21 at 12:46

Using some random example data you could achieve your desired like so:

It's best to reshape your data to long format using e.g. tidyr::pivot_longer which gives us two new cols, one with the names of the variables and one with the corresponding values. After reshaping you could map the values on y and set different shapes and linetypes by mapping the variables column on shape and linetype:

library(ggplot2)
library(tidyr)

set.seed(42)
dat <- data.frame(
  STAIT = runif(20, 0, 1),
  valence = runif(20, 0, 1),
  arousal = runif(20, 0, 1)
)

dat_long <- dat %>%
  pivot_longer(c(valence, arousal), names_to = "var", values_to = "value")

ggplot(dat_long, aes(x = STAIT, y = value, linetype = var, shape = var)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "black", size = .5)
#> `geom_smooth()` using formula 'y ~ x'

score 0 · Answer 2 · answered Jan 09 '21 at 14:14

I figured out a way to do it, with the following code:

sp.both = ggplot(dat,aes(x=STAIT)) +
  geom_point(shape = 16, aes(y=arousal)) +
  geom_point(shape = 1, aes(y=valence)) +
  labs(x='STAI-T score', y= 'valence/arousal score')+
  geom_smooth(method=lm,se=T,fullrange=T,colour='black',aes(y=arousal))+
  geom_smooth(method=lm,se=T,fullrange=T,linetype ='dashed',colour='black',aes(y=valence))+
  apatheme

The only thing I haven't figured out yet, is how to now add a legend with both the linetype (solid/dashed) and the corresponding datapoint (solid/open) and the variable it belongs to. But Stefan's example solved this problem, and I prefer the way the plot then looks visually as well. So that's definitely a better solution to this problem. Thanks!

ggplot2: scatterplot with two variables (measured on the same scale) on the y-axis: how do I change the aesthetics & add seperate regression lines?

2 Answers2