0

I am having an issue producing a side-by-side bar plot of two datasets in R. I previously used the code below to create a plot which had corresponding bars from each of two datasets juxtaposed side by side, with columns from dataset 1 colored red and from dataset 2 colored blue. Now when I run the same code on any pair of datasets, including the originals which are still untouched in my saved workspace, I get separate plots for each dataset, side by side, in which individual columns alternate between red and blue between bins from the dataset. Documentation is not giving (me) any (obvious) clues as to what I've done to change the display. Please help!

## Sample data
set.seed(47)
BG.restricted.hs = round(runif(100, min = 47, max = 1660380))
FG.hs = round(runif(100, min = 0, max = 1820786))

BG.restricted.hs <- data.matrix(BG.restricted.hs, rownames.force = NA)
groups.bg.restricted.hs <- cut(x=BG.restricted.hs, breaks = seq(from = 0, to = 1900000, by = 10000))
rowsums.bg.restricted.hs <- tapply(BG.restricted.hs, groups.bg.restricted.hs, sum)
norm.bg.restricted.hs <- (rowsums.bg.restricted.hs / nrow(BG.restricted.hs))

FG.hs <- data.matrix(FG.hs, rownames.force = NA)
groups.fg.hs <- cut(x=FG.hs, breaks = seq(from = 0, to = 1900000, by = 10000))
rowsums.fg.hs <- tapply(FG.hs, groups.fg.hs, sum)
norm.fg.hs <- (rowsums.fg.hs / nrow(FG.hs))

data <- cbind(norm.fg.hs, norm.bg.restricted.hs)
barplot(height = data, xlab = "TSS Distance", ylab = "Density", col=c("red","blue"), beside = TRUE)

Data files contain only a single column of integers.

user3396385
  • 186
  • 1
  • 11
  • 1
    Could you make this [reproducible](http://stackoverflow.com/q/5963269/903061)? Saying you have code that used to work and now doesn't isn't much to go on, and when it depends on data we don't have... if they data is just integers the easiest way would be for you to simulate something appropriate, maybe `BG.restricted.hs = round(runif(100, min = 0, max = 1e6))`, but it's up to you to choose appropriate values for the length, min, and max for each vector. – Gregor Thomas Mar 09 '15 at 18:16
  • Well, I get the same sort of output if I follow your suggestion: `BG.restricted.hs = round(runif(100, min = 47, max = 1660380))` and `FG.hs = round(runif(100, min = 0, max = 1820786))` Hope this helps! – user3396385 Mar 09 '15 at 18:29
  • 1
    Right, the point is that we can't get any output because we don't have the data. – Gregor Thomas Mar 09 '15 at 18:33
  • Tried to edit the body but it seems you beat me there. – user3396385 Mar 09 '15 at 18:36
  • So, now I'm confused that you're trying to plot these two on top of each other (yes?), but your binwidths are different... and your bin starts and stops... is that really what you want? Why is FG `by = 10910` but BG is `by = 10000`? I'd assume you would want the same `breaks` for both series. – Gregor Thomas Mar 09 '15 at 18:46
  • This was to get the number of bins equal. After some thought, I see what you're saying and it would have been better to specify the breaks more explicitly instead. Will edit the code in the main body to reflect this. – user3396385 Mar 09 '15 at 19:02
  • See my answer---combining the vectors makes getting the same breaks easy. – Gregor Thomas Mar 09 '15 at 19:09

1 Answers1

0

See if this is more or less what you want. It uses ggplot2, but could be adapted for barplot if you prefer:

set.seed(47)
BG.restricted.hs = round(runif(100, min = 47, max = 1660380))
FG.hs = round(runif(100, min = 0, max = 1820786))

We combine the vectors into one column (keeping track of their source in another column) so that we can simultaneously bin both of them.

dat = data.frame(x = c(BG.restricted.hs, FG.hs), 
                 source = c(rep("BG", length(BG.restricted.hs)),
                            rep("FG", length(FG.hs))))
dat$bin = cut(dat$x, breaks = seq(from = min(dat$x), to = max(dat$x), by = 10000))

Plot:

library(ggplot2)
ggplot(dat, aes(x = bin, fill = source)) +
    geom_bar(position = "dodge") +
    theme_bw() +
    scale_x_discrete(breaks = NULL)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • This is closer to what I want but looks to be stacking the bars on top of each other, whereas I would like them side by side to show relative heights. – user3396385 Mar 09 '15 at 19:15
  • Ooops, you're right. I added the `position = "dodge"` which will fix that. – Gregor Thomas Mar 09 '15 at 19:27
  • Perfect! Just a little more work to normalize the bar height and I'll be golden. Thanks! – user3396385 Mar 09 '15 at 19:39
  • Alright, I have to ask one more question related to my last comment: in my real data, the "FG" set has about 10 times the number of rows as the "BG" set and what I am interested in is the relative frequencies in each bin. I'm having trouble figuring out how to normalize to the number of rows in your example, though. Sorry to bother you again with this! – user3396385 Mar 09 '15 at 20:00