2

I am trying to add sub-group labels and order observations on the x-axis in my . There are multiple questions about this on here already but the responses all recommend using faceting (e.g. here). My plot is already faceted, such that these responses don't work for me. I tried using reorder(x, by_this_variable) but this only seems to work if by_this_variable is the y-axis. Why? If I try to reorder it by a different variable, I receive a warning:

argument is not numeric or boolean

To be more specific, I am plotting two points (percentages by participant obtained in two different tasks) for each discrete x-axis value (1 for each participant) with arrows connecting the dots per participant. This is to indicate whether participant behavior was influenced negatively or positively across tasks. My facets are 2 different (treatment) conditions that participants were randomly sorted into. I would now like to group these dot-arrow graph according to different participant origins (a possible predictor for different responses to the treatment) and add this information as a label on the x-axis, but all I can achieve right now is to have the values sorted alphabetically (the default).

This plot might end up looking too busy. If there is a better way to plot all of this information (relative change of behavior by task, by participant, by condition, by origin) in one graph, I am open for suggestions!

My code:

Data <- data.frame(c(28.5, 20, 55.4, 30.5, 66.6, 45.4, 43.2, 43.1, 28.5, 55.4, 30.5, 
                   66.6, 45.4, 20), c("Participant 1", "Participant 1", 
                   "Participant 2", "Participant 2", "Participant 3", 
                   "Participant 3","Participant 4", "Participant 4","Participant 5", 
                   "Participant 5", "Participant 6", "Participant 6", "Participant 7", 
                   "Participant 7"),c("India", "India", "India", "India", "Algeria", 
                   "Algeria", "Algeria", "Algeria", "India", "India", "India", 
                   "India", "Algeria", "Algeria"),c("Treatment A", "Treatment A", 
                   "Treatment B", "Treatment B","Treatment A", "Treatment A", 
                   "Treatment B", "Treatment B", "Treatment A", "Treatment A", 
                   "Treatment B", "Treatment B", "Treatment A", "Treatment A"),
                   c("Task 1", "Task 2", "Task 1", "Task 2", "Task 1", "Task 2", 
                   "Task 1", "Task 2", "Task 1", "Task 2", "Task 1", "Task 2", 
                   "Task 1", "Task 2"))
colnames(Data) <- c("Percentage", "Participant", "Origin", "Treatment", "Task")

ggplot(Data, aes(y=Percentage, x = Participant, group = Participant))+
   geom_point(aes(color = Task))+ 
   geom_line(arrow = arrow(length=unit(0.30,"cm"), type = "closed"), size = .3)+
   facet_grid(~Treatment, scales = "free_x", space = "free_x")+ 
   theme(axis.text.x = element_text(angle = 90, hjust = 1))

This produces the following plot:

Plot

Participants 1 & 5 are from India and 3 & 7 from Algeria, so I would like to group them together on the x-axis and add a label for origin.

EDIT:

The warning above seems to stem from the fact that Origin is a multi-level factor (and reorder appears to work only with numeric values), thus setting x = reorder(Participant, as.numeric(Origin)) will order the values according to Origin, but how can I add appropriate Origin labels below the plot?

IzzyBizzy
  • 43
  • 1
  • 9
  • You can nest the facets like `~Treatment + Origin` – camille Dec 17 '18 at 17:09
  • Thanks! I tried that but I think it is overwhelming, particularly when there are more subgroups to be added. I would simply like to sort the participants on the x-axis and add a label for their origin (as indicated in the plot here: https://stackoverflow.com/questions/23207878/ggplot2-group-x-axis-discrete-values-into-subgroups) – IzzyBizzy Dec 17 '18 at 17:13
  • 1
    Maybe related: [Axis labels on two lines with nested x variables (year below months)](https://stackoverflow.com/questions/44616530/axis-labels-on-two-lines-with-nested-x-variables-year-below-months) – Henrik Dec 17 '18 at 17:23

1 Answers1

0

One suggestion is to use an ordered factor. For the levels of the factor concatenate Origin and Participant. For the labels of the factor, concatenate Participant and Origin.

# The unique values from the column 'Origin_Participant' will act as the levels
# of the factor. The order is imposed by 'Origin', so that participants from
# same country group together.
Data$Origin_Participant <- paste(Data$Origin, Data$Participant, sep = "\n")
# The unique values from 'Participant_Origin' column will be used for the
# factor' labels (what will end up on the plot).
Data$Participant_Origin <- paste(Data$Participant, Data$Origin, sep = "\n")
# Order data.frame by 'Origin_Participant'. Is also important so that the levels
# correspond to the labels of the factor when creating it below.
Data <- Data[order(Data$Origin_Participant),]
# Or in decreasing order if you need
# Data <- Data[order(Data$Origin_Participant, decreasing = TRUE),]

# Finally, create the needed factor.
Data$Origin_Participant <- factor(x = Data$Origin_Participant,
                                  levels = unique(Data$Origin_Participant),
                                  labels = unique(Data$Participant_Origin),
                                  ordered = TRUE)

library(ggplot2)
# Reuse your code, but map the factor `Origin_Participant` into x. I think there
# is no need of a grouping factor. I also added vjust = 0.5 to align the labels
# on the vertical center.
ggplot(Data, aes(y=Percentage, x = Origin_Participant))+
  geom_point(aes(color = Task))+ 
  geom_line(arrow = arrow(length=unit(0.30,"cm"), type = "closed"), size = .3)+
  facet_grid(~Treatment, scales = "free_x", space = "free_x")+ 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

enter image description here

If you do not care that Origin appears first in the labels, then is few steps shorter:

Data$Origin_Participant <- factor(x = paste(Data$Origin, Data$Participant, sep = "\n"),
                                  ordered = TRUE)
ggplot(Data, aes(y=Percentage, x = Origin_Participant))+
  geom_point(aes(color = Task))+ 
  geom_line(arrow = arrow(length=unit(0.30,"cm"), type = "closed"), size = .3)+
  facet_grid(~Treatment, scales = "free_x", space = "free_x")+ 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

enter image description here

Valentin_Ștefan
  • 6,130
  • 2
  • 45
  • 68
  • Interesting solution. Any thoughts as to why ordering only applies to Treatment A? – IzzyBizzy Dec 18 '18 at 19:14
  • Hey @IzzyBizzy, Because I didn't realize at first that there is a need for that in *Treatment B*. I updated my answer. The order is simply set by the order one gives in the `levels` of the new factor `Participant_Origin`. It is highly encouraged in the `ggplot2` universe to use (ordered) factors in such cases. Hope that my answer was helpful. – Valentin_Ștefan Dec 18 '18 at 19:51
  • I think this is a good workaround for small data sets. However, overall I have over 60 participants and this solution introduces a lot of redundancy (repeated countries for neighboring values) and becomes more involved with more participants. I am considering just manually inserting one label per country as a text box (right side up where the x-axis label "Participant_origin" is displayed right now) after sorting the values with the reorder function and printing the plot to a pdf. I can't believe that there is no x-axis aesthetics setting that allows me to adjust this in ggplot automatically. – IzzyBizzy Dec 19 '18 at 20:10
  • I didn't know that you have a lot of levels for `Participant`. I generalized the solution with an ordered factor. See the updates. – Valentin_Ștefan Dec 19 '18 at 22:17