2

I am working with the language R to generate average samples follow a normal distribution. The distribution of the variable X is not showing in the chart. I am getting a blank chart in my R plots. I used the following code:

set.seed(1)
d = data.frame(X=rbeta(20000,2,5))
p = ggplot(d, aes(x=X))
p + geom_bar(aes(y=(..count..)/sum(..count..))) + ylab("Frequency Percent")

Am I missing something?

rcs
  • 67,191
  • 22
  • 172
  • 153
JMCR
  • 23
  • 3

2 Answers2

2

Use geom_histogram for continuous data:

?geom_histogram

Display a 1d distribution by dividing into bins and counting the number of observations in each bin. ...

R> p <- ggplot(d, aes(x=X))
R> p + geom_histogram(aes(y=(..count..)/sum(..count..))) +
   ylab("Frequency Percent") 

plot

geom_histogram uses stat_bin by default, which bins data in ranges and counts the cases in each range. It differs from stat_count (default stat for geom_bar), which counts the number of cases at each x position (without binning into ranges). stat_bin requires continuous x data, whereas stat_count can be used for both discrete and continuous x data.

rcs
  • 67,191
  • 22
  • 172
  • 153
  • `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. – JMCR Apr 11 '16 at 09:00
  • e.g. `geom_histogram(aes(y=(..count..)/sum(..count..)), binwidth=0.025)` – rcs Apr 11 '16 at 09:06
  • See `?stat_bin`: By default, `stat_bin` uses 30 bins - this is not a good default, but the idea is to get you experimenting with different binwidths. You may need to look at a few to uncover the full story behind your data. – rcs Apr 11 '16 at 09:09
2

You are missing stat = "bin"

set.seed(1)
d = data.frame(X=rbeta(2000,2,5))
p = ggplot(d,aes(x=X))
p + geom_bar(aes(y=(..count..)/sum(..count..)), stat="bin") +
    ylab("Frequency Percent")

This SO answer is helpful here.

Community
  • 1
  • 1
m-dz
  • 2,342
  • 17
  • 29
  • stat_bin()` using `bins = 30`. Pick better value with `binwidth – JMCR Apr 11 '16 at 09:09
  • Is this a question or an observation? – m-dz Apr 11 '16 at 09:13
  • In another set of plot distribution i used stat="bin", but showing some Warning message: Removed 5035 rows containing non-finite values (stat_bin). – JMCR Apr 11 '16 at 09:25
  • Check for `NA`s, `NaN`s and +/-`Inf`s in your data, `geom_bar` and `geom_histogram` remove those for counting purposes. – m-dz Apr 11 '16 at 09:53