15

I'm struggling get the right ordering of variables in a graph I made with ggplot2 in R.

Suppose I have a dataframe such as:

set.seed(1234)
my_df<- data.frame(matrix(0,8,4))
names(my_df) <- c("year", "variable", "value", "vartype")
my_df$year <- rep(2006:2007)
my_df$variable <- c(rep("VX",2),rep("VB",2),rep("VZ",2),rep("VD",2))
my_df$value <- runif(8, 5,10) 
my_df$vartype<- c(rep("TA",4), rep("TB",4))

which yields the following table:

  year variable    value vartype
1 2006       VX 5.568517      TA
2 2007       VX 8.111497      TA
3 2006       VB 8.046374      TA
4 2007       VB 8.116897      TA
5 2006       VZ 9.304577      TB
6 2007       VZ 8.201553      TB
7 2006       VD 5.047479      TB
8 2007       VD 6.162753      TB

There are four variables (VX, VB, VZ and VD), belonging to two groups of variable types, (TA and TB).

I would like to plot the values as horizontal bars on the y axis, ordered vertically first by variable groups and then by variable names, faceted by year, with values on the x axis and fill colour corresponding to variable group. (i.e. in this simplified example, the order should be, top to bottom, VB, VX, VD, VZ)

1) My first attempt has been to try the following:

ggplot(my_df,        
    aes(x=variable, y=value, fill=vartype, order=vartype)) +
       # adding or removing the aesthetic "order=vartype" doesn't change anything
     geom_bar() + 
     facet_grid(. ~ year) + 
     coord_flip()

However, the variables are listed in reverse alphabetical order, but not by vartype : the order=vartype aesthetic is ignored.

enter image description here

2) Following an answer to a similar question I posted yesterday, i tried the following, based on the post Order Bars in ggplot2 bar graph :

my_df$variable <- factor(
  my_df$variable, 
  levels=rev(sort(unique(my_df$variable))), 
  ordered=TRUE
)

This approach does gets the variables in vertical alphabetical order in the plot, but ignores the fact that the variables should be ordered first by variable goups (with TA-variables on top and TB-variables below).

enter image description here

3) The following gives the same as 2 (above):

my_df$vartype <- factor(
  my_df$vartype, 
  levels=sort(unique(my_df$vartype)), 
  ordered=TRUE
)

... which has the same issues as the first approach (variables listed in reverse alphabetical order, groups ignored)

4) another approach, based on the original answer to Order Bars in ggplot2 bar graph , also gives the same plat as 2, above

my_df <- within(my_df, 
                vartype <- factor(vartype, 
                levels=names(sort(table(vartype),
                decreasing=TRUE)))
                ) 

I'm puzzled by the fact that, despite several approaches, the aesthetic order=vartype is ignored. Still, it seems to work in an unrelated problem: http://learnr.wordpress.com/2010/03/23/ggplot2-changing-the-default-order-of-legend-labels-and-stacking-of-data/

I hope that the problem is clear and welcome any suggestions.

Matteo

I posted a similar question yesterday, but, unfortunately I made several mistakes when descrbing the problem and providing a reproducible example. I've listened to several suggestions since, and thoroughly searched stakoverflow for similar question and applied, to the best of my knowledge, every suggested combination of solutions, to no avail. I'm posting the question again hoping to be able to solve my issue and, hopefully, be helpful to others.

Community
  • 1
  • 1
MatteoS
  • 745
  • 2
  • 6
  • 17
  • Duplicate of: http://stackoverflow.com/q/5208679/602276 – Andrie Sep 04 '11 at 13:46
  • 2
    It's not a duplicate of stackoverflow.com/q/5208679/602276 . Please read the question carefully. – MatteoS Sep 04 '11 at 13:48
  • It is indeed the same question. You need to specify the levels of your factor **in the order that you want them in your plot**. The linked answer tells you how to do that. – Andrie Sep 04 '11 at 13:54
  • Which, based from the answer you deleted, involves defining the order manually. As I explained in the comment you deleted, I have several large data frame that I need to change often and would like to avoid writing a string of 30-40 11-char variables every time. – MatteoS Sep 04 '11 at 13:56
  • The closest I've got to solving the issue is: `ggplot(my_df, aes(x=reorder(variable,-as.numeric(vartype)), y=value, fill=vartype, order=vartype)) + geom_bar() + facet_grid(. ~ year) + coord_flip()` The TA variables are on top, but, within the groups, they are in reverse-alphabetical order, therefore it's still not a solution. Based on http://stackoverflow.com/questions/1735540/creating-a-pareto-chart-with-ggplot2-and-r – MatteoS Sep 04 '11 at 13:57
  • 1
    +1 for learning to provide reproducible code. – Roman Luštrik Sep 04 '11 at 13:58
  • @MatteoS You are now asking a different question. This question as posed is a duplicate and will no doubt be closed. Your real question seems to be about intertwining and sorting two different variables. I suggest you isolate and ask this as a new question. – Andrie Sep 04 '11 at 14:02
  • @Andrie: from the original question "I would like to plot the values as horizontal bars on the y axis, ordered vertically first by variable groups and then by variable names, faceted by year, with values on the x axis and fill colour corresponding to variable group. (i.e. in this simplified example, the order should be, top to bottom, VB, VX, VD, VZ)" **this** is the original question. Should I post another one? – MatteoS Sep 04 '11 at 14:04
  • @MatteoS Please join the R chat group here: http://chat.stackoverflow.com/rooms/106/r – Andrie Sep 04 '11 at 14:07
  • I'd be glad to, but unfortunately I just registered and do not have the reputation points required... – MatteoS Sep 04 '11 at 14:10
  • @MatteoS The consensus in the chat group is that you have an interesting question to ask about sorting vectors in a non-alphabetical way. I suggest you post a new question about your sorting problem. Simplify your example, remove the ggplot code, and ask a new question about the sorting only. – Andrie Sep 04 '11 at 14:22
  • I see (as you guessed, I'm able to read in the chat room but, unfortunately, cannot write). I'll try to frame the question in a more generic way, but, as far as I see it, it **is** related to ggplot2 plotting issues, as the variable ordering sometimes seems to have a mind of its own. I'm glad that my question was non-trivial, but I'm still looking for a silver bullet as far as my plot is concerned. – MatteoS Sep 04 '11 at 14:24
  • 3
    More generally, I believe there is an issue related to coord_flip() when ordering variables. In my original data frame (not the one shown above), the order of groups in the legend is correct and corresponds to that of the dataframe, but the vertical order of variables is upside-down. (although the plot is conceptually different, the issue is similar to this http://learnr.files.wordpress.com/2010/03/order_variable-0041.png?w=600 ). As far as I can see, this is something beyond an order issue of the dataframe, but an issue concerning the order reversal in ggplot2, possibly related to coord_flip. – MatteoS Sep 04 '11 at 14:41
  • If you use the code you showed, `variable` and `vartype` are *not* factors. **ggplot** will coerce them to factors and thus you get alphabetical ordering. Your question has almost *nothing* to do with **ggplot** and is all about generating an appropriate ordering. – Gavin Simpson Sep 04 '11 at 15:31
  • +1 for a clear, well thought out and obviously researched question. Doesn't look like a dupe to me, there's nothing about the effect of `coord_flip()` on SO. I think there's a difference between two questions that are exactly the same and a solution that happens to be the same for two different questions. – Brandon Bertelsen Sep 05 '11 at 01:11
  • With hindsight, I could have made the question clearer. Anyway, it seems that the coord_flip() issue **has** already been discussed here http://stackoverflow.com/questions/3744178/ggplot2-sorting-a-plot/7310754#7310754 though, for some reason, it did not feature my search results: sorry about that. I think I've found a general solution, and posted it there. – MatteoS Sep 05 '11 at 16:44

1 Answers1

11

This has little to do with ggplot, but is instead a question about generating an ordering of variables to use to reorder the levels of a factor. Here is your data, implemented using the various functions to better effect:

set.seed(1234)
df2 <- data.frame(year = rep(2006:2007), 
                  variable = rep(c("VX","VB","VZ","VD"), each = 2),
                  value = runif(8, 5,10),
                  vartype = rep(c("TA","TB"), each = 4))

Note that this way variable and vartype are factors. If they aren't factors, ggplot() will coerce them and then you get left with alphabetical ordering. I have said this before and will no doubt say it again; get your data into the correct format first before you start plotting / doing data analysis.

You want the following ordering:

> with(df2, order(vartype, variable))
[1] 3 4 1 2 7 8 5 6

where you should note that we get the ordering by vartype first and only then by variable within the levels of vartype. If we use this to reorder the levels of variable we get:

> with(df2, reorder(variable, order(vartype, variable)))
[1] VX VX VB VB VZ VZ VD VD
attr(,"scores")
 VB  VD  VX  VZ 
1.5 5.5 3.5 7.5 
Levels: VB VX VD VZ

(ignore the attr(,"scores") bit and focus on the Levels). This has the right ordering, but ggplot() will draw them bottom to top and you wanted top to bottom. I'm not sufficiently familiar with ggplot() to know if this can be controlled, so we will also need to reverse the ordering using decreasing = TRUE in the call to order().

Putting this all together we have:

## reorder `variable` on `variable` within `vartype`
df3 <- transform(df2, variable = reorder(variable, order(vartype, variable,
                                                         decreasing = TRUE)))

Which when used with your plotting code:

ggplot(df3, aes(x=variable, y=value, fill=vartype)) +
       geom_bar() + 
       facet_grid(. ~ year) + 
       coord_flip()

produces this:

reordered barplot

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 2
    I thank you for your solution! It works. However, i've also found, with a thorough search, that my original issue is a particular case of a common nuisance when using coord_flip(). – MatteoS Sep 04 '11 at 15:38
  • 1
    @MatteoS Do you understand now why people felt this was another duplicate? The answer is the same - reorder the levels of the factor in the order you want them. The issue here was how to derive that ordering. All the **ggplot** code was superfluous and distracting. It does help to boil problems down to their base level and also tell us exactly what you want. Andrie's Answer was almost spot on until you happened to mention in comments you didn't want to enter the ordering by hand. – Gavin Simpson Sep 04 '11 at 15:43
  • 3
    Now I see, but ggplot2 is the issue here. With coord_flip(), the axis are flipped, the variables that are originally ordered L-> R are then ordered B -> T, while the legend does not match them. – MatteoS Sep 04 '11 at 15:44
  • The problem has been documented in http://groups.google.com/group/ggplot2/browse_thread/thread/dcc58f13ec230109/cced681ea653318e?lnk=gst&q=coord_flip%28%29#cced681ea653318e http://www.mail-archive.com/r-help@r-project.org/msg145281.html and http://stackoverflow.com/questions/4000670/how-to-order-breaks-with-ggplot-geom-bar with more or less-fiddly workarounds. Unfortunately, I wasn't able to provide a sufficiently generic dataframe to illustrate my issue and had to resort to the one I provided. I will try apply your solution to my cases, though! – MatteoS Sep 04 '11 at 15:44
  • @MatteoS I don't see how that is the problem - the ordering is all wrong without `coord_flip()` using code you supplied. – Gavin Simpson Sep 04 '11 at 15:46
  • Indeed it is, in this particular dataframe, this is why I believed it was an ordering issue in the first place and framed it as such. I apologize for the confusion, but I believe the ordering and the coord_flip() are intertwined. – MatteoS Sep 04 '11 at 15:48
  • @MatteoS As I understand it, it is convention in **ggplot** to plot the levels of factor-based axes Left-Right on the x-axis and Bottom-Top. It is not meant to replot as though the figure had been rotated. The ordering in the legend in based on `levels(vartype)` it isn't supposed to be in the implied order of (the reordered) `variable`. The ordering and `coord_flip()` are only intertwined in the sense of conventions of how factors are represented. I think you are seeing 2 and 2 and jumping to an answer of 5 here. – Gavin Simpson Sep 04 '11 at 15:57
  • You're right: the variables on the x-axis are ordered L->R based on `levels(variable)` (or, equivalently, B->T with `coord_flip()`), while the legend order is based on `levels(vartype)`, notwithstanding the fact that neither of the variables are **ordered** factors. The order is just the one in which they appear on the dataframe (sorted or not). I previously beleieved that the **default** order on the plot had something to do with the variables being **ordered factors**, but I was mistaken. – MatteoS Sep 04 '11 at 16:06
  • @MatteoS You aren't the first to think that ordered factors were involved. There are several prominent contributors to the r tag here that have made that mistake. Anyway, glad this got sorted in the end. – Gavin Simpson Sep 04 '11 at 16:09
  • To sum up (correct me if I'm mistaken): the default order of variables in a ggplot2 plot corresponds to the one found in `levels(variable)`, unless otherwise specified, by coercing `variable` it into an `ordered factor` with custom ordering. – MatteoS Sep 04 '11 at 16:11
  • Thank you. I'm sorry I made such a mess in explaining the issue, but you have to admit that the relation between plot and variable ordering is somehow misleading (to a beginner such as myself, of course) but that the underlying issue was non-trivial (if poorly explained) – MatteoS Sep 04 '11 at 16:13
  • Gavin, you provided an elegant solution to the problem of non-alphabetical sorting of data-frames. Following the discussion here http://chat.stackoverflow.com/rooms/106/r it seems that this is a question worth answering in itself and you did. However, is there a more general approach to solving the issue of variable ordering when flipping axes in ggplot2 that does not involve specifying the order manually? I would like to get both the variables T->B in the order of `levels(variable)` AND the legend entries in the order of `levels(vartype)`, in the general case, without manual ordering... :-) – MatteoS Sep 04 '11 at 16:34
  • @MatteoS I didn't manually set the ordering of `variable` I did it programatically. **ggplot** will only plot according to the order of the levels of factors. There is no other way to specify this AFAICT, but I don't use **ggplot** that much. Note this has *nothing* to do with *ordered* factors. Forget about them, they are a red herring. The critical issue is to get the order of the *levels* of the factor as you want them and *then* plot. If there were a **ggplot** way of doing this, it would have been suggested here or in the various duplicates that can also be found on SO – Gavin Simpson Sep 04 '11 at 16:44
  • +1 For showing that `ggplot` sorts variables in the order of the factor levels. – Andrie Sep 04 '11 at 16:50
  • Ok, got it. I understand the (lack of direct) relevance of ordered factors and the issue concerning the sequence of `levels(variable)` and `levels(vartype)`. If this is a common issue with no trivial solution, it may be worth to suggest to Hadley to add an option `top.down=TRUE` to `coord_flip()` for a future release of `ggplot2`, what do you say? – MatteoS Sep 04 '11 at 16:58
  • 1
    @MatteoS Ask away, but I don't see the need for this given the general solution of getting the factor levels in the order you want. – Gavin Simpson Sep 04 '11 at 17:56
  • 6
    @MatteoS `scale_fill_discrete(guide = guide_legend(reverse=TRUE))` would be the equivalent for `top.down=TRUE` to reverse the order in legend. – mlt Dec 06 '12 at 06:16
  • With the current `geom_bar` the code is not running anymore: `Error : Mapping a variable to y and also using stat="bin`. Solution: `geom_bar(stat="identity")` – Iris Oct 23 '15 at 10:59