3

I am fairly new to R and ggplot2 and am having some trouble plotting multiple variables in the same histogram plot.

My data is already grouped and just needs to be plotted. The data is by week and I need to plot the number for each category (A, B, C and D).

Date          A    B    C    D
01-01-2011    11   0    11   1
08-01-2011    12   0    3    3
15-01-2011    9    0    2    6

I want the Dates as the x axis and the counts plotted as different colors according to a generic y axis. I am able to plot just one of the categories at a time, but am not able to find an example like mine.

This is what I use to plot one category. I am pretty sure I need to use position="dodge" to plot multiple as I don't want it to be stacked.

ggplot(df, aes(x=Date, y=A)) + geom_histogram(stat="identity") + 
labs(title = "Number in Category A") +
ylab("Number") + 
xlab("Date") +
theme(axis.text.x = element_text(angle = 90))

Also, this gives me a histogram with spaces in between the bars. Is there any way to remove this? I tried spaces=0 as you would do when plotting bar graphs, but it didn't seem to work.

I read some previous questions similar to mine, but the data was in a different format and I couldn't adapt it to fit my data. This is some of the help I looked at:

Creating a histogram with multiple data series using multhist in R

http://www.cookbook-r.com/Graphs/Plotting_distributions_%28ggplot2%29/

I'm also not quite sure what the bin width is. I think it is how the data should be spaced or grouped, which doesn't apply to my question since it is already grouped. Please advise me if I am wrong about this.

Any help would be appreciated. Thanks in advance!

Community
  • 1
  • 1
o.o
  • 143
  • 1
  • 5
  • 11
  • 1
    take a look at the documentation: http://docs.ggplot2.org/current/ – marbel Jan 27 '14 at 02:50
  • see also http://www.cookbook-r.com/Graphs/Plotting_distributions_(ggplot2)/#histogram-and-density-plots-with-multiple-groups. I tried overlaying histograms with alpha but the density plots were the clearest for me when I just wanted to give an idea of the different distributions on a single plot. – TooTone Jan 14 '15 at 17:17

1 Answers1

7

You're not really plotting histograms, you're just plotting a bar chart that looks kind of like a histogram. I personally think this is a good case for faceting:

library(ggplot2)
library(reshape2) # for melt()
melt_df <- melt(df)
head(melt_df) # so you can see it

ggplot(melt_df, aes(Date,value,fill=Date)) + 
 geom_bar() + 
 facet_wrap(~ variable)

enter image description here

However, I think in general, that changes over time are much better represented by a line chart:

ggplot(melt_df,aes(Date,value,group=variable,color=variable)) + geom_line() 

enter image description here

Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
  • The legend gives me the date values rather than the categories. I tried changing the `fill=Date` to something like `fill=names(df)[-1]` thinking that would help, but I got an aesthetics error so it didn't plot anything. Is there a way around this? – o.o Jan 27 '14 at 01:57
  • 1
    If you're using faceting, the colours reference the date, because each facet represents the variables. – Brandon Bertelsen Jan 27 '14 at 01:59
  • Oh right, that's true. I should have realized that. Thank you so much for you help. Also, to remove a warning when plotting, I added `stat="identity"` as my data was already summarized. – o.o Jan 27 '14 at 02:04
  • Your last graph with lines is beautiful but I have a little question is it posible to add points for each line considering as reference values in x axis @BrandonBertelsen – Duck Jan 27 '14 at 14:28