0

I ask this with some trepidation, but I really have looked at other questions and haven't found an example that seems to work for me.

I would like to have the character labels on the y-axis of a ggplot sorted based on other columns of the data frame. I believe that this is a matter of setting up factors and levels correctly, prior to using ggplot, but I am having difficulty with the specifics of how to do this.

Here is a simplified example (to the point of potentially not seeming to make sense):

library(tidyverse)
library(ggplot2)

set.seed(1)
num_rows <- 12
sample_names <- do.call(paste0, replicate(5, sample(letters, num_rows, TRUE), FALSE))
df1 <- data.frame(region=sample(c("N", "S", "E", "W"), num_rows, replace = TRUE), 
                  sub_region=sample(c("High", "Medium", "Low"), num_rows, replace = TRUE),
                  my_order = seq(1,num_rows), 
                  my_name = sample_names,
                  var_1 = sample(100, num_rows, replace = TRUE))

#try using arrange
df2 <- df1 %>% arrange(factor(df1$region, levels = c("N","E","S","W")), 
                       factor(df1$sub_region, levels = c("High","Medium","Low")))
df2 %>% ggplot() + geom_point(aes(x = var_1, y = my_name, color=sub_region))

#try using order
df3 <- df1
df3$region <- factor(df1$region, levels = c("N","E","S","W"))
df3$sub_region <- factor(df1$sub_region, levels = c("High","Medium","Low"))
df4 <- df3[order(df1$region, df1$sub_region, df1$my_order),]
df4 %>% ggplot() + geom_point(aes(x = var_1, y = my_name, color=sub_region))

I'm hoping to have my_names and the corresponding values sorted by region, then subregion, then my_order (as a tie-breaker) in the plot (without, for now at least, showing any of these in the chart), but my_name seems to continue to appear in alphabetical order, whether I try using arrange (from dplyr) or order. I realize that I haven't put in any code for the my_order column, but since the first to levels of sort aren't working, I thought I would hold off on that.

I am looking for the y-axis to be in this order (from the top, down):

qymni fswvl jjkcs ouasm xziqg fqvar

etc.

Clearly, I'm doing something wrong, but I'm not sure what. I would appreciate any help. Also, am I correct that once I have this working correctly, using group_by and summarize from dplyr will preserve the order of my_names?

1 Answers1

1

First off, you can set the order of factor levels for columns like region in your original dataframe. Then you don't end up with all these different slightly modified versions of the same data. Then sort the dataframe how you want it, and use forcats::fct_inorder to reassign the factor levels for my_name based on their current order in the dataframe:

library(tidyverse)
library(ggplot2)
library(forcats)

set.seed(1)
num_rows <- 12
sample_names <- do.call(paste0, replicate(5, sample(letters, num_rows, TRUE), FALSE))
df1 <- data.frame(region=sample(c("N", "S", "E", "W"), num_rows, replace = TRUE), 
                  sub_region=sample(c("High", "Medium", "Low"), num_rows, replace = TRUE),
                  my_order = seq(1,num_rows), 
                  my_name = sample_names,
                  var_1 = sample(100, num_rows, replace = TRUE))

df1$region <- factor(df1$region, levels = c("N","E","S","W"))
df1$sub_region <- factor(df1$sub_region, levels = c("High","Medium","Low"))
df1 <- df1[order(df1$region, df1$sub_region, df1$my_order, decreasing = TRUE), ]
# Order my_name levels based on current order
df1$my_name = fct_inorder(df1$my_name)
df1 %>% ggplot() + geom_point(aes( x = var_1, y = my_name, color=sub_region))

Note that I had to use decreasing = TRUE in the order() call to get the order going top to bottom.

For categorical variables like my_name, it's the order of factor levels that determines the order ggplot plots them in, not their current order in the dataframe which is what you were changing in your example code. This makes the tools in forcats very useful when you need to control the order in a plot.

Marius
  • 58,213
  • 16
  • 107
  • 105
  • Thank you! I wouldn't have come up with this, as I find the documentation for forcats a bit sparse and I didn't realize I needed to use order in this way. I assume that there is probably a way to do this without forcats, but I'm happy to use this solution. – Jonathan Sibley Jun 27 '17 at 13:06