1

So my issue is that when I try to have my two histograms, not representing the counts, but the percentage, the counts are divided by the overall total and not by the total within the category...

p <- temp.df %>%
  ggplot(aes(x = x, fill = churn, stat(density))) +
      geom_histogram(binwidth = 5, alpha = 0.35) +
      xlim(-5,100) + xlab("Session count")
print(p)

x is a numeric value between 0 and a few hundreds which represent the number of sessions open by a user in a particular month and churn is a boolean telling me if the user used the app the month after. It's ~100,000 rows

the graph I get look like that: plot of histogram of session counts for churn and return users

my problem is that I'll want that the cumulated sum of all blue bins = 1, and same for red bins. But now it's the cumulated sum of all blue and red which is equal to 1... The density calculation isn't done separately on each category (churn and not churn)

Any help will be appreciated!

Thanks a lot! ;)

Pieter V.
  • 134
  • 6

0 Answers0