0

Could anyone help point out the reason why my code is not accurately displaying the amount of users who signed up for this event? I found the code online and really liked the visual, however, while the hours field works, the y axis is not truly representative of the thousands of sign ups.

plot_multi_histogram <- function( Cap2021Hour, feature, label_column) {
  plt <- ggplot(Cap2021Hour, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
    geom_histogram(alpha=1, position="identity", aes(y = ..density..), color="black") +
    geom_density(alpha=1) +
    geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
    labs(x=feature, y = "Registered")
  plt + guides(fill=guide_legend(title=label_column))
}

plot_multi_histogram(Cap2021Hour, 'hours','typev1')

Resulting image Histogram Image

Data used

Hour  Registered  Hour Type
  0        11387         Night
  1        6387          Night
  2        4087          Night
  3        3876          Night
  4        989           Night 
  5        14387          Morning
  6        15387          Morning
  7        54387          Morning
  8        50387          Morning
  9        52387          Morning
  10       48787          Morning
  11       44387          Morning
  12       34387          Morning
  13       36387          Evening
  14       34387          Evening
  15       44387          Evening
  16       54387          Evening
  17       64387          Evening
  18       67387          Evening
  19       54387          Evening
  20       24387          Evening
  21       22387          Night
  22       20387          Night
  23       19387          Night
  • It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. Note that density curves cannot really be used for counts. It's the area under the curve that tells you the proportion of observations found, not the height of the curve. – MrFlick Sep 14 '22 at 15:14
  • @mrFlick Thank you! Just added a example of the data used – FondaThaGr8t Sep 14 '22 at 15:27
  • 1
    `..density..` gives an estimate of the density, where the total area under the curve should be 1--not a display of counts. If you want counts maybe use `geom_freqpoly` instead, and you'll have to scale up the `geom_density` as well. It might be enough to put `stat = "count"` in the `geom_density` layer, not sure. – Gregor Thomas Sep 14 '22 at 15:37

0 Answers0