ploting DENSITY histograms with ggplot

Question

how can I simultaneously plot as two histograms the density of values in these two vectors:

interactors1 = c(-6.4, -3.7, -7.7, -4.6, -2.0, -5.5, -3.3, -5.8, -5.0, -4.5,
                  3.2, -0.1, -3.0, -9.2, -3.1, -8.5, -5.4, -9.1, -7.7,  2.2,
                  1.7,  3.4, -8.6, -0.5, -8.1)

and

noninteractors1 = c(-1, 0.1, 2.7, 0.4, 4.3)

Before you ask, yes I did check out this post

I want to use ggplot and not hist, because the plots look much better. When I melt the data into a data frame and plot counts everything is fine - I get this

interactors=data.frame(interactors1,noninteractors1)

ggplot(melt(interactors), aes(value, fill = variable)) 
      + geom_histogram(position = "dodge")

However, I don't need counts, I need densities.

When I do

ggplot(melt(interactors), aes(value, fill = variable)) 
   + geom_histogram(aes(y=..density..),position = "dodge")

I get a corny result . That can't be right because the sum of the densities*bins exceeds 1. What am I doing wrong? Any help would be appreciated.

P.S. I tried posting the plots, but it's not letting me...

You should add `binwidth=1`, if you want it to add up to 1. Otherwise the densities add up to `1/binwidth`, which is set automatically by `ggplot` (with a warning). In general, I would advise against ignoring warning messages... — shadow, Jun 30 '14 at 13:45
When I specify and explicitly specify binsidth=1, I get a histogram where the sum of the bins*density doesn't seem to add up to 1. It definitely looks better than not specifying bin breaks, but how do I interpret the results? Cheers everyone. — Stefan, Jul 01 '14 at 16:47

score 2 · Answer 1 · answered Jul 06 '15 at 21:54

If you don't want to set binwidth = 1, you can multiply the y value by the binwidth. For example:

m + geom_histogram(binwidth = 0.5, aes(y = (..density..)*0.5))

This allows you to vary the binwidth and create frequency histograms with the proper scale.

score 1 · Accepted Answer · answered Jun 30 '14 at 13:51

1

Try this

data=(melt(interactors))
ggplot(data, aes(x=value, fill=variable)) + geom_histogram(aes(y=..density..), binwidth = 1)

answered Jun 30 '14 at 13:51

Philippe

194
1
12

1

When I specify the bins and explicitly specify binsidth=1, I get a histogram where the sum of the bins*density doesn't seem to add up to 1. It definitely looks better than not specifying bin breaks, but how do I interpret the results? Cheers everyone. – Stefan Jul 01 '14 at 16:47
the `bindwidth` argument seems to have same effect as breaks. In fact `geom_histogram`use the `stat_bin`function. So, You can write your histogram like that: `ggplot(data, aes(x=value, fill=variable)) + stat_bin(aes(y=..density..), breaks = seq(-10,5,by=1), position = "dodge")` and get the same result as before. The benefit here is it's the same syntax as the generic method of `hist`. Are you sure for the `binsidth`argument? I see nothing about that. The second option was to compare your result with the generic method. – Philippe Jul 02 '14 at 08:15
This produces the same results as before. My question is, can I specify binwidths AND get everything to add up to 1. If not, how do I interpret the data. This is an example histogram I have (not on the above two vectors, see image below). How do I know what's going on if the values don;t add up to 1? Can I directly compare the two distributions? – Stefan Jul 02 '14 at 09:52
For that, the `geom_density()` function is more dedicated, I think. This function call the `density`function from the base package `stats` and use the same argument and algorithms. You can find the all the details of the two functions whith `?stat_density` and especially with `?density`. I hope that's help you. – Philippe Jul 02 '14 at 13:13

score -2 · Answer 3 · answered Jul 02 '14 at 14:54

Thanks to Jaap and Philippe for their replies, much appreciated. I found what I was looking for on Google, but will post it here as well - the more info there is, the better.

In order for the sum of the densities to equal 1, binwidth=1 must be specified (thanks Jaap). When the bins are not equal to 1, the sum of the densities is not equal to 1. Rather, the sum of the products of density*bin width equals 1. When binwidth=1, the height of each rectangle (the density) is equal to the probability of your variable having a value=x. When binwidth!=1, the probability of that is equal to the area of the rectangle, i.e. density*width of the rectangle/the size of the bin/.

Cheers everyone. :)

I think it would be better if you just edit your question with this answer and delet this answer — Marcin, Nov 03 '15 at 10:48

ploting DENSITY histograms with ggplot

3 Answers3