7

This question asks about ordering a bar graph according to an unsummarized table. I have a slightly different situation. Here's part of my original data:

experiment,pvs_id,src,hrc,mqs,mcs,dmqs,imcs
dna-wm,0,7,9,4.454545454545454,1.4545454545454546,1.4545454545454541,4.3939393939393945
dna-wm,1,7,4,2.909090909090909,1.8181818181818181,0.09090909090909083,3.9090909090909087
dna-wm,2,7,1,4.818181818181818,1.4545454545454546,1.8181818181818183,4.3939393939393945
dna-wm,3,7,8,3.4545454545454546,1.5454545454545454,0.4545454545454546,4.272727272727273
dna-wm,4,7,10,3.8181818181818183,1.9090909090909092,0.8181818181818183,3.7878787878787876
dna-wm,5,7,7,3.909090909090909,1.9090909090909092,0.9090909090909092,3.7878787878787876
dna-wm,6,7,0,4.909090909090909,1.3636363636363635,1.9090909090909092,4.515151515151516
dna-wm,7,7,3,3.909090909090909,1.7272727272727273,0.9090909090909092,4.030303030303029
dna-wm,8,7,11,3.6363636363636362,1.5454545454545454,0.6363636363636362,4.272727272727273

I only need a few variables from this, namely mqs and imcs, grouped by their pvs_id, so I create a new table:

m = melt(t, id.var="pvs_id", measure.var=c("mqs","imcs"))

I can plot this as a bar graph where one can see the correlation between MQS and IMCS.

ggplot(m, aes(x=pvs_id, y=value)) 
+ geom_bar(aes(fill=variable), position="dodge", stat="identity")

However, I'd like the resulting bars to be ordered by the MQS value, from left to right, in decreasing order. The IMCS values should be ordered with those, of course.

How can I accomplish that? Generally, given any molten dataframe — which seems useful for graphing in ggplot2 and today's the first time I've stumbled over it — how do I specify the order for one variable?

Community
  • 1
  • 1
slhck
  • 36,575
  • 28
  • 148
  • 201

2 Answers2

7

It's all in making

pvs_id a factor and supplying the appropriate levels to it:

dat$pvs_id <- factor(dat$pvs_id, levels = dat[order(-dat$mqs), 2])

m = melt(dat, id.var="pvs_id", measure.var=c("mqs","imcs"))

ggplot(m, aes(x=pvs_id, y=value))+ 
    geom_bar(aes(fill=variable), position="dodge", stat="identity")

This produces the following plot:

EDIT: Well since pvs_id was numeric it is treated in an ordered fashion. Where as if you have a factor no order is assumed. So even though you have numeric labels pvs_id is actually a factor (nominal). And as far as dat[order(-dat$mqs), 2] is concerned the order function with a negative sign orders the data frame from largest to smallest along the variable mqs. But you're interested in that order for the pvs_id variable so you index that column which is the second column. If you tear that apart you'll see it gives you:

> dat[order(-dat$mqs), 2]
[1] 6 2 0 5 7 4 8 3 1

Now you supply that to the levels argument of factor and this orders the factor as you want it.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • PS don't name a data set t as it writes over a pretty important base install function that transposes. – Tyler Rinker Sep 18 '12 at 12:24
  • 2
    Actually, it doesn't. It can tell data sets from functions. Try it: `t <- 1:5; t(matrix(1:9, nrow = 3))`. If you were to say `t <- function(x) 1:x`, you would get into trouble, though. – Roman Luštrik Sep 18 '12 at 13:22
  • Perfect, thanks. I'm not really into the nomenclature here so I wasn't aware of what "factors" were. And `dat[order(-dat$mqs), 2]` returns the `pvs_id` values in order of decreasing `mqs`, right? How exactly does that work? Would you mind explaining that part a little more, so future visitors might be able to adapt this more easily? – slhck Sep 18 '12 at 14:42
0

With newer tidyverse functions, this becomes much more straightforward (or at least, easy to read for me):

library(tidyverse)

d %>%
  mutate_at("pvs_id", as.factor) %>%
  mutate(pvs_id = fct_reorder(pvs_id, mqs)) %>%
  gather(variable, value, c(mqs, imcs)) %>% 
  ggplot(aes(x = pvs_id, y = value)) + 
    geom_col(aes(fill = variable), position = position_dodge())

What it does is:

  • create a factor if not already
  • reorder it according to mqs (you may use desc(mqs) for reverse-sorting)
  • gather into individual rows (same as melt)
  • plot as geom_col (same as geom_bar with stat="identity")

slhck
  • 36,575
  • 28
  • 148
  • 201