12

I have a data frame containing order data for each of 20+ products from each of 20+ countries. I have put it in a highlight table using ggplot2 with code similar to this:

require(ggplot2)
require(reshape)
require(scales)

mydf <- data.frame(industry = c('all industries','steel','cars'), 
    'all regions' = c(250,150,100), americas = c(150,90,60), 
     europe = c(150,60,40), check.names = FALSE)
mydf

mymelt <- melt(mydf, id.var = c('industry'))
mymelt

ggplot(mymelt, aes(x = industry, y = variable, fill = value)) +
    geom_tile() + geom_text(aes(fill = mymelt$value, label = mymelt$value))

Which produces a plot like this:

highlight table

In the real plot, the 450 cell table very nicely shows the 'hotspots' where orders are concentrated. The last refinement I want to implement is to arrange the items on both the x-axis and y-axis in alphabetical order. So in the plot above, the y-axis (variable) would be ordered as all regions, americas, then europe and the x-axis (industry) would be ordered all industries, cars and steel. In fact the x-axis is already ordered alphabetically, but I wouldn't know how to achieve that if it were not already the case.

I feel somewhat embarrassed about having to ask this question as I know there are many similar on SO, but sorting and ordering in R remains my personal bugbear and I cannot get this to work. Although I do try, in all except the simplest cases I got lost in a welter of calls to factor, levels, sort, order and with.

Q. How can I arrange the above highlight table so that both y-axis and x-axis are ordered alphabetically?

EDIT: The answers from smillig and joran below do resolve the question with the test data but with the real data the problem remains: I can't get an alphabetical sort. This leaves me scratching my head as the basic structure of the data frame looks the same. Clearly I have omitted something, but what??

> str(mymelt)
'data.frame':   340 obs. of  3 variables:
 $ Industry: chr  "Animal and vegetable products" "Food and beverages" "Chemicals" "Plastic and rubber goods" ...
 $ variable: Factor w/ 17 levels "Other areas",..: 17 17 17 17 17 17 17 17 17 17 ...
 $ value   : num  0.000904 0.000515 0.007189 0.007721 0.000274 ...

However, applying the with statement doesn't result in levels with an alphabetical sort.

> with(mymelt,factor(variable,levels = rev(sort(unique(variable)))))

  [1] USA                   USA                   USA                  
  [4] USA                   USA                   USA                  
  [7] USA                   USA                   USA                  
 [10] USA                   USA                   USA                  
 [13] USA                   USA                   USA                  
 [16] USA                   USA                   USA                  
 [19] USA                   USA                   Canada               
 [22] Canada                Canada                Canada               
 [25] Canada                Canada                Canada               
 [28] Canada                Canada                Canada    

All the way down to:

 [334] Other areas           Other areas           Other areas          
 [337] Other areas           Other areas           Other areas          
 [340] Other areas

And if you do a levels() it seems to show the same thing:

 [1] "Other areas"           "Oceania"               "Africa"               
 [4] "Other Non-Eurozone"    "UK"                    "Other Eurozone"       
 [7] "Holland"               "Germany"               "Other Asia"           
[10] "Middle East"           "ASEAN-5"               "Singapore"            
[13] "HK/China"              "Japan"                 "South Central America"
[16] "Canada"                "USA"  

That is, the non-reversed version of the above.

The following shot shows what the plot of the real data looks like. As you can see, the x-axis is sorted and the y-axis is not. I'm perplexed. I'm missing something but can't see what it is.

screenshot of plot with real data

zx8754
  • 52,746
  • 12
  • 114
  • 209
SlowLearner
  • 7,907
  • 11
  • 49
  • 80

5 Answers5

6

The y-axis on your chart is also already ordered alphabetically, but from the origin. I think you can achieve the order of the axes that you want by using xlim and ylim. For example:

ggplot(mymelt, aes(x = industry, y = variable, fill = value)) +
    geom_tile() + geom_text(aes(fill = mymelt$value, label = mymelt$value)) +
    ylim(rev(levels(mymelt$variable))) + xlim(levels(mymelt$industry))

will order the y-axis from all regions at the top, followed by americas, and then europe at the bottom (which is reverse alphabetical order, technically). The x-axis is alphabetically ordered from all industries to steel with cars in between.

enter image description here

smillig
  • 5,073
  • 6
  • 36
  • 46
  • 2
    thank you for your answer. If I take the `ggplot` call above and slot it into my code, I get a `Error in UseMethod("limits") : no applicable method for 'limits' applied to an object of class "NULL"` error. Was there something else you did in addition to adding the `xlim` and `ylim` statements? – SlowLearner Jul 22 '12 at 11:24
  • No, I just ran your code exactly as it appears above except for adding the `xlim` and `ylim` bits. I'm sorry that I don't know what that error means. – smillig Jul 22 '12 at 11:36
  • I suspect it's because `levels(mymelt$industry)` returns NULL. I appreciate the attempt. – SlowLearner Jul 22 '12 at 11:47
  • It shouldn't though. For me, `levels(mymelt$industry)` gives `[1] "all industries" "cars" "steel"`. What does `str(mymelt)` tell you? Both `industry` and `variable` should be `Factor`s. – smillig Jul 22 '12 at 11:59
  • Well, machines vary. I have `options(stringsAsFactors=FALSE)` in my startup - that's probably the cause. – SlowLearner Jul 22 '12 at 12:01
  • 1
    Then wouldn't `mymelt$industry<-as.factor(mymelt$industry)` solve your problem? – smillig Jul 22 '12 at 12:04
  • I added `mydf$industry <- as.factor(mydf$industry)` and the example above now seems to work. Will experiment with the real data and report back. Cheers – SlowLearner Jul 22 '12 at 12:04
  • The suggested changes do reverse the y-axis (`mymelt$variable`) but it doesn't leave it sorted alphabetically. – SlowLearner Jul 22 '12 at 12:15
4

As smillig says, the default is already to order the axes alphabetically, but the y axis will be ordered from the lower left corner up.

The basic rule with ggplot2 that applies to almost anything that you want in a specific order is:

  • If you want something to appear in a particular order, you must make the corresponding variable a factor, with the levels sorted in your desired order.

In this case, all you should need to do it this:

mymelt$variable <- with(mymelt,factor(variable,levels = rev(sort(unique(variable)))))

which should work regardless of whether you're running R with stringsAsFactors = TRUE or FALSE.

This principle applies to ordering axis labels, ordering bars, ordering segments within bars, ordering facets, etc.

For continuous variables there is a convenient scale_*_reverse() but apparently not for discrete variables, which would be a nice addition, I think.

joran
  • 169,992
  • 32
  • 429
  • 468
  • @SlowLearner This is very simple: character variables -> default alphabetical ordering. factor variables -> ordered in the order their levels are in. That's all there is to it. – joran Jul 22 '12 at 22:48
  • Thanks, this is actually the bit I can't get right! I have tried something like `within(mymelt, variable <- factor(mymelt$variable, levels = mymelt$variable[order(mymelt$variable, decreasing = T)], ordered = TRUE))` but this didn't do the trick. – SlowLearner Jul 22 '12 at 22:56
  • @SlowLearner (1) Using `within` means you can omit the `mymelt$`, (2) try it with `levels = sort(levels(variable))` (or `rev` it if needed). – joran Jul 22 '12 at 22:59
  • appreciate the help; as per below it's not quite getting there as variable still isn't sorted `within(mymelt, variable <- factor(variable, levels = sort(levels(variable)), ordered = TRUE)) Industry variable value 1 Animal and vegetable products USA 9.039006e-04 2 Food and beverages USA 5.152928e-04` – SlowLearner Jul 22 '12 at 23:06
  • 1
    @SlowLearner If you actually bother to give me a reproducible data set I wil demonstrate _exactly_ how to do this. You will be able to copy+paste it and it will work. Until then, I can't help any more. – joran Jul 22 '12 at 23:10
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/14255/discussion-between-joran-and-slowlearner) – joran Jul 22 '12 at 23:15
1

Another possibility is to use fct_reorder from forecast library.

library(forecast)
mydf %>%
pivot_longer(cols=c('all regions', 'americas', 'europe')) %>% 
  mutate(name1=fct_reorder(name, value, .desc=FALSE)) %>% 
  ggplot( aes(x = industry, y = name1, fill = value)) +
  geom_tile() + geom_text(aes( label = value))
Xiaojie Zhou
  • 164
  • 2
  • 6
0

maybe this StackOverflow question can help:

Order data inside a geom_tile

specifically the first answer by Brandon Bertelsen:

"Note it's not an ordered factor, it's a factor in the right order"

It helped me to get the right order of the y-axis in a ggplot2 geom_tile plot.

Community
  • 1
  • 1
tryptofame
  • 352
  • 2
  • 7
  • 18
0

Maybe a little bit late,

with(mymelt,factor(variable,levels = rev(sort(unique(variable)))))

this function doesn't order, because you are ordering "variable" that has no order (it's an unordered factor).

You should transform first the variable to a character, with the as.character function, like so:

with(mymelt,factor(variable,levels = rev(sort(unique(as.character(variable))))))
franz
  • 1