Incorrect mapping of group aesthetic in geom_line() to a variable

Question

I am trying to plot the results of a study with within-subjects experimental manipulations in ggplot2. I would like geom_line() to connect the points of the same subject between different trial types but not between conditions (see figure below). When I try to map subject to group in my data, it produces lines connecting the same subject across the whole plot, which is not what I want.

This is how it looks:

What I get

Now what I would like to obtain - lines are orange for better visibility here. It looks crap because I generated only a few data points, but this is the idea:

lines connecting individuals only within condition

A small, workable example (apologies in advance for my inelegant code):

# produce fake data
signal <- c(0.74660393, 1.69004752, 0.38833258, 1.00708169, 0.72820926, 0.63489489,
            1.46378383, 0.67635374, 0.15536748, 0.19220099, 0.32673839, 1.64773836,
            0.99743467, 0.47589479, 0.83547231, 1.98466375, 0.13243662, 1.26484889,
            1.67973564, 1.52482770, 1.87735472, 1.00273947, 1.71527739, 0.23374901) 
subject <- rep(c("Sub1", "Sub2", "Sub3", "Sub4"), 6)
condition <- c(rep("Congruent", 8), rep("Incongruent", 8), rep("Neutral", 8))
trial <- c("alt", "rep", "alt", "rep", "alt", "rep",
           "alt", "rep", "alt", "rep", "alt", "rep",
           "alt", "rep", "alt", "rep", "alt", "rep",
           "alt", "rep", "alt", "rep", "alt", "rep")
hemisphere <- c(rep("right", 4), rep("left", 4),
                rep("right", 4), rep("left", 4),
                rep("right", 4), rep("left", 4))

df <- data.frame(subject, signal, condition, trial, hemisphere)

Now the plot.

library(ggplot2)

p <- ggplot(data = df, aes(y = signal, x = condition, fill = trial)) 
p + geom_point(aes(fill = trial, color = trial), 
               position = position_dodge(width = 0.6), 
                alpha = 0.3) +
    geom_line(aes(group = subject), alpha = 0.1) +
  geom_boxplot(position = position_dodge(width = 0.6), alpha = 0.05) +
  facet_wrap(~hemisphere) +
  stat_summary(fun = "mean",
              geom = "crossbar", 
              aes(colour = trial),
               position = position_dodge(width = 0.6),
               width = 0.6) +
 theme_classic()

I already tried:

putting the group aesthetic in ggplot() instead that in geom_line(), it screws up the whole thing
producing a paired variable for geom_line() as paste(df$subject, df$condition, df$hemisphere), but it doesn't produce what I want - there are no lines anymore

Any other solutions?

Thanks in advance!

related https://stackoverflow.com/questions/44656299/lines-connecting-jittered-points-dodging-by-multiple-groups — tjebo, Apr 02 '23 at 16:20
in your data, each subject/hemisphere combination has only one trial type. — tjebo, Apr 02 '23 at 16:25

score 2 · Accepted Answer · answered Apr 02 '23 at 16:27

IMHO this can't be achieved by the group aes, because on the one hand we have to group by subject to get a line per subject and on the other hand we have to group by condition and trial to dodge the start and end positions. And you can't both at the same time. Instead I would suggest to manually dodge the lines, i.e. compute the start and end positions manually. To this end convert condition and trial to numerics, rescale trial_num to the interval c(-1, 1) and finally account for (half of) the width of the dodge and the number of categories in trial, i.e. divide by 4.

Note: Your example data contained only one obs. per subject, condition and hemisphere so I adjusted your data slightly so that we have two obs.

library(dplyr, warn = FALSE)
library(ggplot2)

width <- .75
pd <- position_dodge(width = width)

df <- df %>%
  mutate(
    condition_num = as.numeric(factor(condition)),
    trial_num = as.numeric(factor(trial)),
    trial_num = scales::rescale(trial_num, to = c(-1, 1)),
    x_line = condition_num + width / 4 * trial_num
  )

p <- ggplot(data = df, aes(y = signal, x = condition, fill = trial))
p + geom_point(aes(fill = trial, color = trial),
  position = pd,
  alpha = 0.3
) +
  geom_line(aes(x = x_line, group = interaction(subject, condition)),
    color = "orange"
  ) +
  geom_boxplot(position = pd, alpha = 0.05, width = .6) +
  facet_wrap(~hemisphere) +
  stat_summary(
    fun = "mean",
    geom = "crossbar",
    aes(colour = trial),
    position = pd,
    width = 0.6
  ) +
  theme_classic()

DATA

signal <- c(
  0.74660393, 1.69004752, 0.38833258, 1.00708169, 0.72820926, 0.63489489,
  1.46378383, 0.67635374, 0.15536748, 0.19220099, 0.32673839, 1.64773836,
  0.99743467, 0.47589479, 0.83547231, 1.98466375, 0.13243662, 1.26484889,
  1.67973564, 1.52482770, 1.87735472, 1.00273947, 1.71527739, 0.23374901
)
subject <- rep(c("Sub1", "Sub2", "Sub3", "Sub4"), 12)
condition <- rep(c("Congruent", "Incongruent", "Neutral"), each = 8)
trial <- rep(c("alt", "rep"), each = 4)
hemisphere <- rep(c("right", "left"), 24)

Thanks a lot! :) This worked very well con my real data. ps. You are right, my fake data would have needed at least two observations per subject. — Nolandinoparty, Apr 10 '23 at 14:54

tjebo · Answer 2 · 2023-04-02T16:58:38.460

I agree with Stefan that manual dodge is the way to go. I'd use as.integer instead of as.numeric, as I always try to keep as simple data types as possible and also it generally avoids floating numbers to use integers instead of real number class. As in Lines connecting jittered points - dodging by multiple groups, this is a fairly commonly desired visualisation, but in my opinion it's not doing your readers/reviewers a big favour. With its double visualisation of the same subject it results in severe clutter of overlapping geometries (lines).

You have paired data and this is much better visualised as scatter plot, or simply by plotting the difference between your measurements. If you only have four values, I'd not use a boxplot - this is a bit unnecessary. You could then also make use of the gained space by direct labelling of your subjects, which makes a very intuitive read.

As Stefan, I have also modified your data because your mock data doesn't seem to represent your actual data from what you tell.

library(tidyverse)

## your data creation, slightly different from here:
trial <- rep(c("alt", "rep"), each = 4, times = 3)
df <- data.frame(subject, signal, condition, trial)

df_wide <- df %>%
  pivot_wider(names_from = trial, values_from = signal) 

ggplot(df_wide, aes(alt, rep)) +
  geom_text(aes(label = subject)) +
  facet_grid(~condition) +
  geom_abline(slope = 1, lty = 2, linewidth = .5) +
  coord_equal()

Or, I think in this case even better, show the difference:

ggplot(df_wide, aes(x = condition, y = rep - alt)) +
  geom_text(aes(label = subject, color = subject), show.legend = F)

^{Created on 2023-04-02 with reprex v2.0.2}

Thanks a lot, plotting the difference is a great idea too! Interestingly, these lines are actually a request from a reviewer. But I agree with you, this kind of visualization can get really messy and the same message can be better conveyed by doing something like this. — Nolandinoparty, Apr 10 '23 at 14:57

Incorrect mapping of group aesthetic in geom_line() to a variable

2 Answers2