0

I have a dataframe that contains information on business bids. I want to create a histogram that shows the percentage of bids that were booked for each bin. The x axis should be price, and the y axis should be the percent of bids that booked within that bin.

my code filters the data for two years, then uses the ggplot 2 library to create the histogram. In this example, "booked" is a logical variable, and "price" is a continuous number. Heres what I have for code:

exp_data %>% filter(yr>=2019 & yr<=2020) %>% 
ggplot(aes(price)) +
geom_histogram(aes(y=mean(booked)),binwidth=500)

When I run the code I get the following error:

Error: stat_bin() must not be used with a y aesthetic.

I've also tried this

exp_data %>% filter(yr>=2019 & yr<=2020) %>% 
  ggplot(aes(price)) +
  geom_histogram(aes(y=((..booked==TRUE..)/(..count..))),binwidth=500)

That code produces this error: Error in FUN(X[[i]], ...) : object '..booked' not found

I'm guessing I'm simply ignorant of the right way to ask for the math functions I'm trying to get. Can someone help me learn how to do this?

here's a sample data frame: ''' exp_data <- data.frame(price=abs(rnorm(100))*10000, booked=sample(c(TRUE,FALSE),size=100,replace = TRUE))

with this code, I can plot the counts of true & false & stack/dodge them, but all I really want to know is the percent of true.

exp_data %>% ggplot(aes(price)) + geom_histogram(aes(fill=booked),position="dodge")

plot of dodged counts

  • 2
    A small sample of your data makes solving your problem much easier. – MrCorote May 26 '20 at 17:36
  • In the documentation, I see "stat_bin() is suitable only for continuous x data. If your x data is discrete, you probably want to use stat_count()." That may be part of your problem. Also note that it appears to expect to be associated with x aesthetic, not y. Look at the examples there. – Carl Witthoft May 26 '20 at 17:51
  • Hi Viper, welcome to Stack Overflow. As Pedro suggests, it will be much easier to help if you provide at least a sample of your data with `dput(exp_data)` or if your data is very large `dput(exp_data[1:10,])`. You can edit your question and paste the output. You can surround it with three backticks (```) for better formatting. See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/) for more info. – Ian Campbell May 26 '20 at 17:57

1 Answers1

0

Edit after OP updated post and provided example data.

As mentioned previously you should start creating the data you want to plot and not implement too much of the statistics in your plot call. %/% will divide each price by 500 and round down to the nearest integer. Below I generate labels from the bins, e.g. bin 1 will be transformed to "0-500$"; you have to specify the levels, otherwise the buckets will be ordered according to their first character, then their second. You would therefore end up with 1000 before 200. Finally you have to make sure that you rotate the labels. Most importantly, the chart below is a customised bar plot and not a histogram. A histogram counts the number of observations in a bucket relative to the x axis, whereas you are showing a summary statistic per bucket.

exp_data <- data.frame(price=abs(rnorm(100))*10000, 
                       booked=sample(c(TRUE,FALSE),size=100,replace = TRUE))


library(tidyverse)
perc_booked <- exp_data %>% 
  mutate(bin = price %/% 500) %>% 
  group_by(bin) %>% 
  summarise(percent = mean(booked) * 100)   

perc_booked %>% 
  mutate(label = factor(paste0((bin - 1) * 500, "-", bin * 500, "$"), levels = paste0((bin - 1) * 500, "-", bin * 500, "$"))) %>% 
  ggplot(aes(x = label, y = percent)) +
  geom_bar(stat = "identity") +
  theme(axis.text.x = element_text("angle" = 90)) + 
  xlab("") +
  ylab("Percent Booked")
SBFin
  • 352
  • 1
  • 9
  • That's not exactly what I"m trying to do. In business terms, I'm trying to create a histogram of the rate of bids that won (coversion rate) for each $500 increment in price increase – ViperDriver May 26 '20 at 23:22
  • Changes my answer. You can of course choose any x axis label you like. Unless you shift the labels, they will appear _between_ the tick marks, i.e. you will not label the tick marks 0, 500 and so on but the range between them. – SBFin May 27 '20 at 00:17