2

Suppose I have the dataframe like below:

df <- data.frame(x = runif(100))
df$x2 = df$x*100
cut = quantile(df$x2, 0.75)
df$label = ifelse(df$x2>cut, 1, 0)

          x       x2 label
1 0.1431888 14.31888     0
2 0.9131599 91.31599     1
3 0.5659831 56.59831     0
4 0.8358059 83.58059     1
5 0.3125397 31.25397     0
6 0.8823542 88.23542     1

The task is:

Firstly, to show the histogram of x, which can be done using the geom_histogram()

Secondly, in each bin, I want to color the bin by the fraction of label equals 1 in this bin.

I am confused about how to achieve it. Because I need to know the number of 1 in this bin and the number of point in this bin, which is difficult for me how to do it (the binwidth is not fixed). Since I search in the website but only find that the geom_histogram() color change by the x, for example in this link .

The output result I want is like this:

plot_example:

The image is generated by the following code:

ggplot(df, aes(x = x, fill = ..x..)) + geom_histogram()

But in this example, the color depends on x in each bin. However, I want the color to depend on the fraction of label equals 1 (the third column) in each bin.

MSR
  • 2,731
  • 1
  • 14
  • 24
Xi Wang
  • 21
  • 5

1 Answers1

1

We can use hist function to create the breaks and counts manually, so that we can do a mean of label inside each bin of the histogram:

library(dplyr)

H = hist(df$x,breaks=30,plot=FALSE)
plotdf <- df %>% 
mutate(bins=cut(df$x,breaks=H$breaks,bins=H$mids)) %>%
group_by(bins) %>%
summarise(label=mean(label),n=length(bins)) 

From here on, we plot x as the bin, y as number of counts and fill it with the mean number of label == 1:

ggplot(plotdf,aes(x=bins,y=n,fill=label)) + geom_col()+
scale_fill_gradient2(low="#f6e1e1",mid="#ff9d76",high="#eb4d55")+
scale_x_discrete(labels=H$mids)

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • I have another question, how to adjust the geom_col() width to close the blank between each bars? I tried to set the geom_col(width=0.05) in this case, but the width get much smaller. -_- thx~ – Xi Wang Dec 23 '19 at 14:22
  • @XiWang, try something like geom_col(width=0.98,col="black")? – StupidWolf Dec 23 '19 at 14:51
  • 1
    thx~ I finally set geom_col(width=1,color="black"). It works well. – Xi Wang Dec 23 '19 at 15:04