0

For my thesis, I am making scatterplots in APA format in R. So far, my code is as follows, and it works great for plotting just one variable with confidence interval and regression line:

  scatterplot=ggplot(dat, aes(x=STAIT, y=valence))+
    geom_point()+
    geom_smooth(method=lm,se=T, fullrange=T,colour='black')+
    labs(x='STAI-T score', y='Report length')+
    apatheme

However, I have two variables that were initially measured on the same 0-100 scale: valence and arousal. Instead of two seperate plots, I thought it would be nice to add both variables in a single plot, using 'valence/arousal score' as the ylab and open/closed dots to define which data points come from which variable, a bit like in this example I found online. In that example, however, the data comes from different groups. So that code doesn't work on my data. I've tried different things, and the closest I get, is with the following code:

sp.both=ggplot(dat, aes(x=STAIT))+
  geom_point(aes(y=valence)) +
  geom_point(aes(y=arousal)) +
  apatheme

This gives me a scatterplot with data points of both of the variables added in the same plot. However, I need the data points of one score to be visually different from the other, and I want to add two seperate regression lines for each variable. But everything I've tried so far, has resulted in errors, and I cannot find any examples online of people trying to do the same thing.

Any help would be highly appreciated!

Eva Beunk
  • 3
  • 3
  • 1
    Welcome to SO! To help us to help you could you please make your issue reproducible by sharing a sample of your **data**? See [how to make a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Simply type `dput(NAME_OF_DATASET)` into the console and copy & paste the output starting with `structure(....` into your post. If your dataset has a lot of observations you could do `dput(head(NAME_OF_DATASET, 20))` for the first twenty rows of data. – stefan Jan 09 '21 at 12:25

2 Answers2

0

Using some random example data you could achieve your desired like so:

It's best to reshape your data to long format using e.g. tidyr::pivot_longer which gives us two new cols, one with the names of the variables and one with the corresponding values. After reshaping you could map the values on y and set different shapes and linetypes by mapping the variables column on shape and linetype:

library(ggplot2)
library(tidyr)

set.seed(42)
dat <- data.frame(
  STAIT = runif(20, 0, 1),
  valence = runif(20, 0, 1),
  arousal = runif(20, 0, 1)
)

dat_long <- dat %>%
  pivot_longer(c(valence, arousal), names_to = "var", values_to = "value")

ggplot(dat_long, aes(x = STAIT, y = value, linetype = var, shape = var)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "black", size = .5)
#> `geom_smooth()` using formula 'y ~ x'

stefan
  • 90,330
  • 6
  • 25
  • 51
0

I figured out a way to do it, with the following code:

sp.both = ggplot(dat,aes(x=STAIT)) +
  geom_point(shape = 16, aes(y=arousal)) +
  geom_point(shape = 1, aes(y=valence)) +
  labs(x='STAI-T score', y= 'valence/arousal score')+
  geom_smooth(method=lm,se=T,fullrange=T,colour='black',aes(y=arousal))+
  geom_smooth(method=lm,se=T,fullrange=T,linetype ='dashed',colour='black',aes(y=valence))+
  apatheme

The only thing I haven't figured out yet, is how to now add a legend with both the linetype (solid/dashed) and the corresponding datapoint (solid/open) and the variable it belongs to. But Stefan's example solved this problem, and I prefer the way the plot then looks visually as well. So that's definitely a better solution to this problem. Thanks!

Eva Beunk
  • 3
  • 3