1

This is how I want the plot to look:

flights$carrier[seq(1, length(flights$carrier), 20)] <- NA

flights %>% 
  count(carrier) %>% 
  top_n(10) %>% 
  ggplot() +
  geom_col(aes(x = reorder(carrier, n), y = n))

enter image description here

But I'd like to rename the NA as "Unknown". Whenever I do, the position of the bar changes:

flights %>% 
  count(carrier) %>% 
  mutate(
    carrier = coalesce(carrier, 'Unknown')
    ) %>% 
  top_n(10) %>% 
  ggplot() +
  geom_col(aes(x = reorder(carrier, n), y = n))

enter image description here

I've tried several different things, including attempting to more or less manually relabel with scale_x_discrete and others. Even if that worked, it wouldn't scale well.

userABC123
  • 1,460
  • 2
  • 18
  • 31
  • 2
    You should set the `levels` for your `carrier` variable – Tung Sep 20 '18 at 22:45
  • 1
    The position of the bars is based on the order of the `factor` levels of `carrier`. Simply reorder the levels of `carrier` to reflect the order that you want. – Maurits Evers Sep 20 '18 at 22:45

2 Answers2

2

Like already mentioned in the commentary and following by this question you need to sort your levels for ordering the bars. Those specify the position of the bars in the plot.

I used the factor(df, levels = c(...)) function for this like used by Gavin Simpson in the provided answer to the linked question. For other approaches and solutions check out the whole question.

The following is an example how it could be done with your reproducible data.

df<-flights %>% 
         count(carrier) %>% 
         mutate(
           carrier = coalesce(carrier, 'Unknown')
         )%>% top_n(10)

df$carrier<- factor(df$carrier, levels=c("WN",  "9E", "US", "MQ", "AA", "DL", "EV", "B6", "UA", "Unknown"))


ggplot(data = df) +
  geom_col(aes(x = carrier, y = n))

This provides the desired output graph: enter image description here

For a general approach:

You can read your levels as string and remove your unwanted variables. Just append it again and it's at the last position (or just put it where ever you want).

I used a few steps so it's easier to understand:

foo <- levels(factor(reorder(df$carrier, df$n)))
foo <- foo[foo!="Unknown"]
foo <- append(foo, "Unknown")

now just use foo for the levels:

df$carrier<- factor(df$carrier, levels=foo)
mischva11
  • 2,811
  • 3
  • 18
  • 34
  • Thank you. This works for my example, but image that I have 100 levels to reorder. Something like `df$carrier <- factor(df$carrier, levels=c(df$carrier[1:9], "Unknown"))` doesn't appear to work. – userABC123 Sep 20 '18 at 23:54
  • @snd added general approach, it's more a "how to reorder a string" problem. Hope this helps – mischva11 Sep 21 '18 at 00:03
  • 1
    Thanks a lot! I spent way too much time on that... I didn't know that I apparently don't fully know how to reorder a string, and that was the underlying issue ¯\\_(ツ)_/¯. Thanks for effectively answering two questions. – userABC123 Sep 21 '18 at 00:16
2

Since you are already using the tidyverse, you can solve your problem by simply using fct_relevel() to set "Unknown" as the last level before plotting.

This alternative is quite nice, as you don't need to know in advance how many levels are there, nor arrange them in a separate step.

flights %>% 
    count(carrier) %>% 
    mutate(
        carrier = coalesce(carrier, 'Unknown')
    ) %>% 
    top_n(10) %>% 
    ggplot() +
    geom_col(aes(x = fct_relevel(reorder(carrier, n), "Unknown", after = Inf), y = n)) +
    labs(x = "carrier")

enter image description here

HAVB
  • 1,858
  • 1
  • 22
  • 37