I ask this with some trepidation, but I really have looked at other questions and haven't found an example that seems to work for me.
I would like to have the character labels on the y-axis of a ggplot sorted based on other columns of the data frame. I believe that this is a matter of setting up factors and levels correctly, prior to using ggplot, but I am having difficulty with the specifics of how to do this.
Here is a simplified example (to the point of potentially not seeming to make sense):
library(tidyverse)
library(ggplot2)
set.seed(1)
num_rows <- 12
sample_names <- do.call(paste0, replicate(5, sample(letters, num_rows, TRUE), FALSE))
df1 <- data.frame(region=sample(c("N", "S", "E", "W"), num_rows, replace = TRUE),
sub_region=sample(c("High", "Medium", "Low"), num_rows, replace = TRUE),
my_order = seq(1,num_rows),
my_name = sample_names,
var_1 = sample(100, num_rows, replace = TRUE))
#try using arrange
df2 <- df1 %>% arrange(factor(df1$region, levels = c("N","E","S","W")),
factor(df1$sub_region, levels = c("High","Medium","Low")))
df2 %>% ggplot() + geom_point(aes(x = var_1, y = my_name, color=sub_region))
#try using order
df3 <- df1
df3$region <- factor(df1$region, levels = c("N","E","S","W"))
df3$sub_region <- factor(df1$sub_region, levels = c("High","Medium","Low"))
df4 <- df3[order(df1$region, df1$sub_region, df1$my_order),]
df4 %>% ggplot() + geom_point(aes(x = var_1, y = my_name, color=sub_region))
I'm hoping to have my_names and the corresponding values sorted by region, then subregion, then my_order (as a tie-breaker) in the plot (without, for now at least, showing any of these in the chart), but my_name seems to continue to appear in alphabetical order, whether I try using arrange (from dplyr) or order. I realize that I haven't put in any code for the my_order column, but since the first to levels of sort aren't working, I thought I would hold off on that.
I am looking for the y-axis to be in this order (from the top, down):
qymni fswvl jjkcs ouasm xziqg fqvar
etc.
Clearly, I'm doing something wrong, but I'm not sure what. I would appreciate any help. Also, am I correct that once I have this working correctly, using group_by and summarize from dplyr will preserve the order of my_names?