3

I am trying to generate a histogram in rbokeh.

The direct approach ly_hist leads to an unexpected counts (fig below, top). The indirect approach ly_bar gives a x-axis that is not sorted by factor level (fig below, bottom).

rbokeh

enter image description here

ggplot2 gives the expected results.

enter image description here

code:

library(data.table)
library(rbokeh)
library(ggplot2)

# generate data ==============
set.seed(123)
x = data.table(
  hour = sample.int(n = 24, size = 100, replace = T)
)

# summarize
y = x[, .N, keyby = hour]

# ggplot ======================
theme_set(theme_bw())

g1 = ggplot(x) + 
  geom_histogram(aes(hour), bins = 24, fill = "steelblue", col = "white", alpha = 0.5 ) + 
  scale_x_continuous(breaks = seq(1, 24, 1))

g2 = ggplot(y) + 
  geom_bar(aes(hour, N), stat = "identity", fill = "steelblue", alpha = 0.5)


# rbokeh ==================
b1 = figure() %>%
  ly_hist(hour, data = x, breaks = 24)

y[, hour := factor(hour)]

b2 = figure() %>%
  ly_bar(hour, N, data = y)

Q: (1) how can I generate a histogram using rbokeh that produces the expected result (as in ggplot2) and (2) how can I get the x-axis to be sorted in the right order?

bigreddot
  • 33,642
  • 5
  • 69
  • 122
Henk
  • 3,634
  • 5
  • 28
  • 54
  • 1
    What makes you think the rbokeh histogram is incorrect? – Hong Ooi Oct 20 '16 at 12:26
  • I meant the output is _unexpected_. I compared the plot with the "y" object (see code) and the ggplot result. – Henk Oct 20 '16 at 12:32
  • Well, your Q (1) is "how can I generate a correct histogram using rboken" which implies it was incorrect in the first place. In any case, you can play with the arguments to `ly_hist`, which are exactly the same as those to `hist`. – Hong Ooi Oct 20 '16 at 12:34
  • 1
    (Btw, this is why statisticians tend to favour kernel densities over histograms. There is nothing wrong with the ly_hist output, and illustrates that a histogram can be quite sensitive to the placement of bins. But everybody still uses histograms, including statisticians.) – Hong Ooi Oct 20 '16 at 12:36
  • I have tried that. What is your suggestion? – Henk Oct 20 '16 at 12:37

1 Answers1

1

The ly_hist function treats the data as continuous and therefore bins it, so the output obtained for ly_hist should be expected.

For ly_bar, you can control the x axis by either specifying the xlim argument to figure():

figure(xlim = as.character(1:24)) %>%
  ly_bar(hour, N, data = y)

or by piping the figure through the x_range() function:

figure() %>%
  ly_bar(hour, N, data = y) %>%
  x_range(as.character(1:24))

Also note that if you do not want to do the summarization up front, you can just pass the x variable and it will count things up.

figure(xlim = as.character(1:24)) %>%
  ly_bar(as.character(hour), data = x)

By default, ideally rbokeh should honor factor level ordering for axes and should be able to handle inputs of unexpected types more gracefully (to avoid the as.character() business), and these will be addressed in future updates.

Ryan
  • 192
  • 9