1

I have a dataset with the number of impressions from a unique user and whether this user has been converted = 1, or not (=0). I want to create a col chart that displays the conversion rate for intervals of 20 impressions. Meaning that for each interval, the conversion rate is the number of converted users in that interval of impressions, divided by the number of unique users in that interval.

So for instance, for this dataset:

# A tibble: 19 x 2
   converted tot_impr
       <dbl>    <dbl>
 1         0       19
 2         0        4
 3         1       19
 4         0       13
 5         0       18
 6         1        9
 7         1       17
 8         1        8
 9         1        8
10         1       11
11         0        8
12         0       19
13         1        8
14         0        8
15         1       18
16         0       12
17         1        5
18         1       12
19         0        1

I should be seeing those conversion rates:

enter image description here

I have managed to count the number of converted users per interval using ggplot2 geom_col using the following code:

ggplot(data = db) + 
  geom_col(mapping = aes(x = tot_impr, y = converted), width=5)

I am struggling to force geom_col to display not the converted count in the y-axis, but to display the percentage of converted in relation to the total number of individual samples in that interval of impressions.

Could someone help me out?

Thank you in advance!

SandPiper
  • 2,816
  • 5
  • 30
  • 52
  • 2
    Have you tried to calculate the percentage first and pass it to `y` in the `aes` call? – Algo7 Jan 10 '21 at 21:12
  • Yes, probably better to calculate first. See this too for the posting your dataset: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – william3031 Jan 10 '21 at 21:25
  • Where is the information about users in your data? How do you get `count` and `converted` as 3 and 1 respectively for 0-5 bin? – Ronak Shah Jan 11 '21 at 05:23

1 Answers1

0

Try with this. It is better to compute your variables before plotting:

library(dplyr)
library(ggplot2)
#Code
df %>% mutate(Cut=cut(tot_impr,breaks = seq(0,20,by=5),include.lowest = T,
                      right = T,dig.lab = 10)) %>%
  group_by(Cut) %>%
  summarise(N=n(),converted=sum(converted)) %>%
  mutate(conv_rate=converted/N) %>%
  ggplot(aes(x=Cut,y=conv_rate))+
  geom_bar(stat='identity',fill='magenta')+
  scale_y_continuous(labels = scales::percent)

Output:

enter image description here

Some data used:

#Data
df <- structure(list(converted = c(0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 
1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L), tot_impr = c(19L, 
4L, 19L, 13L, 18L, 9L, 17L, 8L, 8L, 11L, 8L, 19L, 8L, 8L, 18L, 
12L, 5L, 12L, 1L)), row.names = c(NA, -19L), class = "data.frame")
Duck
  • 39,058
  • 13
  • 42
  • 84