0

I have a dataset with two groups - Experimental and Control. Each participant contributes two responses per group, which represent different learning styles. These are represented in the box plots with jitter below. I would like to connect each participant's two responses together with lines using ggplot (so each red line in the Control group would correspond to each turquoise line in the control group), however I can't figure out how to do this within the conditions. Can someone please help? I am new to R and really need guidance.

Then, I need to change the color of the lines within the conditions to black if Increase = TRUE and red if Increase = FALSE.

Ideally, I need it to look like Jon's example here, but with black or red lines based on True or False: Connecting grouped points with lines in ggplot

The data and ggplot looks like this:

d <- data.frame (
  Subject = c("1", "2", "3", "4"),
  Group  = c("Exp", "Exp", "Control", "Control"),
  Tr = c("14", "11", "4", "23"),
  Sr = c("56", "78", "12", "10"),
  Increase = c("TRUE", "TRUE", "TRUE", "FALSE")
)

# put the data in long format
d <- d %>%
  gather(key = "Strategy", value = "raw", Tr, Sr)

d %>%
  ggplot(aes(x = Group, y = raw, color = Strategy)) +
  geom_boxplot(width = 0.5, lwd = 0.5) +
  geom_jitter(width = 0.15) +
  geom_line(aes(group = raw),
            color = "grey",
            arrow = arrow(type = "closed",
                          length = unit(0.075, "inches"))) 
EllaM
  • 3
  • 5
  • 1
    Please share sample data as copy/pasteable code in valid R syntax, not as a screenshot of a table. `dput()` is a great command for that, `dput(data[1:12, ])` will give a copy/pasteable version of the first 12 rows of data including all class and structure information. – Gregor Thomas May 12 '22 at 13:42
  • 1
    From the picture of your data, it's not clear what points should be connected. Is there an ID column or something to indicate which pairs of points go together? – Gregor Thomas May 12 '22 at 13:42
  • Thanks, Gregor. I just added a copy/pasteable code - is that better? – EllaM May 12 '22 at 13:59
  • The points that need to be connected are the Tr and Sr per each subject in each Exp and Ctr group. So for the first one, the points that need to be connected within the Exp group are 14 and 23 – EllaM May 12 '22 at 14:00
  • This is much clearer. `position_jitterdodge` works well for point and boxplot [as per this answer](https://stackoverflow.com/q/48954358/903061), but with lines also I think the only option is to jitter manually - adding the noise as columns in your data. I don't have time to write an answer up now, but later this evening if no one else has answered I'll try to take a look. – Gregor Thomas May 12 '22 at 16:02
  • Thanks, Gregor. I looked at the example above and the position_jitterdodge worked well, however I still can't get it to connect the two values per subject in each Ctr and Exp group. I am not sure what you mean by adding the noise as columns in the data, can you kindly clarify or add an example please? – EllaM May 13 '22 at 06:23

2 Answers2

0

Inspired from the answer you linked to - @Jon's answer

There are a few key things to understand the solution

  1. Since you need points and lines to be connected, you need them both to apply the exact same random jitter or it is best to jitter the data before it goes into plotting which is what I did.
  2. Since the variable to apply jitter on is not a number, it is helpful to note that R plots the character vector Group as a factor, interpreted as numbers 1,2,3,.. corresponding to the factor levels. Hence we create a numeric vector group_jit with values around 1 and 2, with offsets based on the colouring variable Strategy to shift slightly left and right around 1 and 2.
  3. Since you have two independent colour scales going on, it is best to have the Groups represented as fill and the lines represented as colour to avoid a single legend with 4 things on it.

Here's the code -

library(tidyverse)

# Load data
d <- data.frame (
  Subject = c("1", "2", "3", "4"),
  Group  = c("Exp", "Exp", "Control", "Control"),
  Tr = c("14", "11", "4", "23"),
  Sr = c("56", "78", "12", "10"),
  Increase = c("TRUE", "TRUE", "TRUE", "FALSE")
)

width_jitter <- 0.2 # 1 means full width between points

# put the data in long format
d_jit <- d %>%
  gather(key = "Strategy", value = "raw", Tr, Sr) %>% 
  
  # type conversions
  mutate(across(c(Group, Strategy), as_factor)) %>% # convert to factors
  mutate(raw = as.numeric(raw)) %>% # make raw as numbers
  
  # position on x axis is based on combination of Group and jittered Strategy. Mix to taste.
  mutate(group_jit = as.numeric(Group) + jitter(as.numeric(Strategy) - 1.5) * width_jitter * 2,
         grouping = interaction(Subject, Strategy))

# plotting
d_jit %>%
  ggplot(aes(x = Group, y = raw, fill = Strategy)) +
  geom_boxplot(width = 0.5, lwd = 0.5, alpha = 0.05, show.legend = FALSE) +
  geom_point(aes(x = group_jit), size = 3, shape = 21) +
  
  geom_line(aes(x = group_jit,
                group = Subject,
                colour = Increase),
            alpha = 0.5,
            arrow = arrow(type = "closed",
                          length = unit(0.075, "inches"))
            ) + 
  scale_colour_manual(values = c('red', 'black'))

Created on 2022-05-14 by the reprex package (v2.0.1)

For completeness sake, a different, and more elegant way to to do the jitter is to give a position argument to the geom_point and geom_line commands. This argument is a function which adds the random jitter like this (source: @erocoar's answer)

position = ggplot2::position_jitterdodge(dodge.width = 0.75, jitter.width = 0.3, seed = 1)

This way the data itself is not changed and the plotting takes care of the jittering details

  • jitterdodge does the dodge (shift for the x axis variable) and jitter (small noise for the coloured points)
  • The seed argument here is key since it ensures that the same random values are returned for the point and the line functions that call it independently
  • Thank you so much, Prashant! This is exactly what I needed! The example is really clear and the explanations really helped me understand what I needed to do. Much appreciated :) – EllaM May 15 '22 at 08:23
  • You're welcome! If the question has been answered satisfactorily, could you `accept` it by clicking on the check box on the top left so that the question is marked settled for future people finding this post, Thanks! – Prashant Bharadwaj May 15 '22 at 21:27
0

Not a direct answer to your question, but I wanted to suggest an alternative visualisation.

You are dealing with paired data. A much more convincing visualisation is achieved with a scatter plot. You will use the two dimensions of your paper rather than mapping your two dimensions onto only one. You can compare control with subjects better and see immediately which one got better or worse.

library(tidyverse)

d <- data.frame (
  Subject = c("1", "2", "3", "4"),
  Group  = c("Exp", "Exp", "Control", "Control"),
  Tr = c("14", "11", "4", "23"),
  Sr = c("56", "78", "12", "10"),
  Increase = c("TRUE", "TRUE", "TRUE", "FALSE")
)  %>%
## convert to numeric first
mutate(across(c(Tr,Sr), as.integer))

## set coordinate limits
lims <- range(c(d$Tr, d$Sr))

ggplot(d) +
  geom_point(aes(Tr, Sr, color = Group)) +
## adding a line of equality and setting limits equal helps guide the eye
  geom_abline(intercept = 0, slope = 1, lty = "dashed") +
  coord_equal(xlim = lims , ylim = lims )

enter image description here

tjebo
  • 21,977
  • 7
  • 58
  • 94