2

I have a data frame with values and their associated weights. I want to make a histogram, such that each bar's height corresponds to the number of values in that bin and the bar's color corresponds to their total weight. How do I do that?

Example:

D <- data.frame(
    x = c(-0.39, 0.12, 0.94, 1.67, 1.76, 2.44, 3.72, 4.28, 4.92, 5.53, 0.06,
          0.48, 1.01, 1.68, 1.80, 3.25, 4.12, 4.60, 5.28, 6.22),
    w = c(0.1810479, 0.2209460, 0.2974134, 0.3768152, 0.3871925, 0.4682943,
          0.6220371, 0.6838944, 0.7473117, 0.7993555, 0.2159428, 0.2526883,
          0.3046069, 0.3779629, 0.3918383, 0.5667588, 0.6667623, 0.7166747,
          0.7790540, 0.8480375))

ggplot(D, aes(x)) +
    geom_histogram(aes(y=..density..), binwidth=0.5, boundary=0.5)

Solution

Based on eipi10's answer, but using standard functions:

breaks <- seq(-0.5, 6.5, 0.5)
bins   <- cut(D$x, breaks)

h <- data.frame(
    x      = head(breaks, -1) + 0.25,
    count  = sapply(split(D$x, bins), length),
    weight = sapply(split(D$w, bins), sum))
h$density <- h$count / sum(h$count)

ggplot(h) + geom_bar(aes(x, density, fill=weight), stat='identity')

EM visualization using this method

Don Reba
  • 13,814
  • 3
  • 48
  • 61
  • 1
    For ideas on tweaking the aesthetics to be some arbitrary statistical transform, see also [Customizing aesthetics of faceted barplot](http://stackoverflow.com/questions/6297677/customizing-aesthetics-of-faceted-barplot) – smci Dec 12 '16 at 00:44

2 Answers2

2

Another option is to pre-summarise the data:

library(dplyr)

D_bins = D %>% 
  mutate(bins = cut(x, seq(-0.5,6.5,0.5), labels=seq(-0.25,6.5,0.5)),
         bins = as.numeric(as.character(bins()))) %>%
  group_by(bins) %>%
  summarise(count_x = n(),
            sum_w = sum(w))

ggplot(D_bins) +
  geom_bar(aes(bins, count_x, fill=sum_w), colour="white", stat="identity") 

enter image description here

You could also use two sets of opposing bars, rather than a fill aesthetic:

ggplot(D_bins) +
  geom_bar(aes(bins, count_x), colour="white", fill="blue", stat="identity") +
  geom_bar(aes(bins, -sum_w), colour="white", fill="red", stat="identity") +
  scale_x_continuous(breaks=-1:10) +
  scale_y_continuous(limits=c(-2,4), breaks=seq(-2,5,1), labels=c(2,1,0:5)) +
  labs(y = c("Sum of w                       Count of x            ")) +
  coord_flip()

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • I suggested that, but I was trying to find out how to do it with `stat_bin` and suchlike. This does kind of sidestep the original question... – smci Dec 12 '16 at 03:28
  • Ended up going with this approach. @smci, thanks for the effort. – Don Reba Dec 12 '16 at 03:42
0
ggplot(D, aes(x)) + geom_histogram(aes(y=..density.., fill=..count..), binwidth=0.5, boundary=0.5)
  • By "the bar's color (actually:fill) corresponds to their total weight.", you mean "sum". There is no builtin ..sum.. unlike ..count... Maybe you need to prepreprocess the data into bins.
smci
  • 32,567
  • 20
  • 113
  • 146
  • How do I use `stat_sum` here? – Don Reba Dec 12 '16 at 00:32
  • I'm still looking at the options for `stat_density` and `stat_bin`... it's gotta be there somewhere. Presumably you use one of their builtins like `..density.., ..count..` etc. – smci Dec 12 '16 at 00:50
  • But then, the problem is that the default statistic works on the `x` column. – Don Reba Dec 12 '16 at 00:53
  • Does `fill=..count..` do what you want or not? Please view what that plots. You can always customize the palette. There may well be some more statistically-correct preprocessing we could do, but it's all the same difference really if it's just for tweaking one plot's aesthetics. – smci Dec 12 '16 at 00:57
  • `fill=..count..` colours the bars according to the number of values in the bin. I want the colour to show total weight (sum of w) instead. – Don Reba Dec 12 '16 at 01:00
  • I played with the `stat_*` functions and read a lot of SO on this. There may be an obscure way with ggplot, rather than manually preprocessing the bin data. But I can't currently find it. – smci Dec 12 '16 at 03:29