37

I have a question concerning the order of data in my geom_bar.

This is my dataset:

  SM_P,Spotted melanosis on palm,16.2
  DM_P,Diffuse melanosis on palm,78.6
  SM_T,Spotted melanosis on trunk,57.3
  DM_T,Diffuse melanosis on trunk,20.6
  LEU_M,Leuco melanosis,17
  WB_M,Whole body melanosis,8.4
  SK_P,Spotted keratosis on palm,35.4
  DK_P,Diffuse keratosis on palm,23.5
  SK_S,Spotted keratosis on sole,66
  DK_S,Diffuse keratosis on sole,52.8
  CH_BRON,Dorsal keratosis,39
  LIV_EN,Chronic bronchities,6
  DOR,Liver enlargement,2.4
  CARCI,Carcinoma,1

I assign the following colnames:

  colnames(df) <- c("abbr", "derma", "prevalence") # Assign row and column names

Then I plot:

  ggplot(data=df, aes(x=derma, y=prevalence)) + geom_bar(stat="identity") + coord_flip()

Plot

Why does ggplot2 randomly change the order of my data. I would like to have the order of my data in align with my data.frame.

Any help is much appreciated!

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
Stücke
  • 868
  • 3
  • 14
  • 41
  • 1
    It's not random, it's alphabetical. See here for solution http://stackoverflow.com/questions/3253641/change-the-order-of-a-discrete-x-scale – arvi1000 Jun 30 '16 at 19:26
  • First of all thanks for your response. If I apply `derma_table <- table(df$derma) derma_levels <- names(derma_table)[order(df$prevalence)] df$derma2 <- factor(df$derma, levels =derma_levels)` and then plot `ggplot(data=df, aes(x=derma, y=prevalence)) + geom_bar(stat="identity") + coord_flip()` plots exactly the same as in my question. In fact the commands only change the data.frame into alphabetical order which is exactly what I would like to avoid` – Stücke Jun 30 '16 at 19:53
  • you are re-leveling the `derma2` factor, but then using `x=derma` – arvi1000 Jun 30 '16 at 20:15
  • Hey arvi, first of all thanks for your patience. I don't fully understand because if I open my `df` df$derma and df$derma2 have exactly the same order anyway. So if I change what df$ I plot it doesn't make a difference. – Stücke Jun 30 '16 at 20:24
  • See below for plot ordering either by 'native order' or sorted by prevalence – arvi1000 Jun 30 '16 at 20:37
  • The order of rows in your data frame doesn't matter at all. The order that matters is the order of the levels of the factor: `levels(df$derma)`. You put those in whatever order you want to plot. – Gregor Thomas Jun 13 '18 at 21:04
  • @jaap I think this question (i.e. how to stop reordering of cols) is slightly (but importantly) different to the question how to reorder cols. I think it would be very useful to reopen on that basis. The reason being, if an order has been determined earlier in a workflow, the current best answers make this happen: i) order determined, ii) ggplot reorders, iii) (best answers) reorder again. Which doesn't make much sense if the data were originally in the correct order and some (any) way exists to stop geom_bar from reordering – stevec Mar 13 '20 at 22:38

1 Answers1

64

Posting as answer because comment thread getting long. You have to specify the order by using the factor levels of the variable you map with aes(x=...)

# lock in factor level order
df$derma <- factor(df$derma, levels = df$derma)

# plot
ggplot(data=df, aes(x=derma, y=prevalence)) + 
    geom_bar(stat="identity") + coord_flip()

Result, same order as in df: enter image description here

# or, order by prevalence:
df$derma <- factor(df$derma, levels = df$derma[order(df$prevalence)])

Same plot command gives:

enter image description here


I read in the data like this:

read.table(text=
"SM_P,Spotted melanosis on palm,16.2
DM_P,Diffuse melanosis on palm,78.6
SM_T,Spotted melanosis on trunk,57.3
DM_T,Diffuse melanosis on trunk,20.6
LEU_M,Leuco melanosis,17
WB_M,Whole body melanosis,8.4
SK_P,Spotted keratosis on palm,35.4
DK_P,Diffuse keratosis on palm,23.5
SK_S,Spotted keratosis on sole,66
DK_S,Diffuse keratosis on sole,52.8
CH_BRON,Dorsal keratosis,39
LIV_EN,Chronic bronchities,6
DOR,Liver enlargement,2.4
CARCI,Carcinoma,1", header=F, sep=',')
colnames(df) <- c("abbr", "derma", "prevalence") # Assign row and column names
arvi1000
  • 9,393
  • 2
  • 42
  • 52
  • Thanks for your effort! I really appreciate your help! Did you remove some lines from the code which you posted? I don't get the same tick marks when I try the code. – Stücke Jun 30 '16 at 20:47
  • the only thing I didn't post was the code I used to read in your data. now added. – arvi1000 Jun 30 '16 at 21:03
  • Strange ... I get a different axis. Anyway, thank you very much for your effort! Much appreciated! :) – Stücke Jun 30 '16 at 21:48
  • 5
    If anyone is wondering how to do this with data where levels are present more than once in the variable (i.e., when _not_ using `stat = "identity"` but rather the default count stat), you can add the `unique()` function in the first step. For example: `df$var <- factor(df$var, levels = unique(df$var))` – stragu Jan 27 '19 at 06:38
  • @stragu if you are not using `unique()` in the newer versions of R, you may encounter problems. Thanks for that tip – Abel Callejo May 11 '20 at 05:34