6

Is it possible to create a time-series histogram like the one described in this presentation (slides 36-39) using either R or D3.js? Or is there a better way to show bucketed data as a time series?

Edit: Here is some pre-bucketed sample data. Ideally, D3 or R would do the bucketing by itself. And yes, if it wasn't clear, I understand that I could write this myself. I'm just wondering if there's already a package that does this and I just haven't come across it yet. Thanks!

epo3
  • 2,991
  • 2
  • 33
  • 60
septagram
  • 445
  • 2
  • 7
  • 14
  • 6
    These types of questions call for `fortunes::fortune("Yoda")`. _Of course_ it is possible. But if you are asking "has someone already done this in a package?", the answer may be no. That said, the chart is interesting and someone should probably code up such a time-series histogram. Maybe you can do it? – Dirk Eddelbuettel Jul 27 '12 at 14:56
  • 1
    This should be easy to do using `ggplot2`. Perhaps if you post some sample data I might be tempted to make such a plot. – Andrie Jul 27 '12 at 15:01

3 Answers3

12

Here's a version in D3, modeled after @bdemarest's answer using ggplot2:

D3 Heatmap

This version uses tiled rect elements. If you have a large dataset, you might get better performance from a pixel-based heatmap.

If you want to compute the buckets using D3, you can use d3.nest to group the data by day and by value. There's also d3.layout.histogram, but since you presumably want uniformly-spaced bins and the same bins for every day, d3.nest should be sufficient.

One subtle consideration: I placed the tick marks on the scale in-between tiles so as to indicate visually how the values are binned. For example, the bottom-left bucket corresponds to all values between 800 and 900 on July 20 (where July 20 is the midnight-to-midnight interval); at least, that’s what I assumed from looking at your data. This is slightly clearer than labeling the middle of the rect because it indicates that the values are floored rather than rounded.

mbostock
  • 51,423
  • 13
  • 175
  • 129
11

Here is one possible solution using R and ggplot2.

Your data, ready to paste into R console:

dat = structure(list(date = structure(c(15541, 15541, 15541, 15541, 
    15541, 15541, 15541, 15541, 15541, 15541, 15541, 15541, 15541, 
    15541, 15541, 15541, 15541, 15542, 15542, 15542, 15542, 15542, 
    15542, 15542, 15542, 15542, 15542, 15542, 15542, 15542, 15542, 
    15542, 15543, 15543, 15543, 15543, 15543, 15543, 15543, 15543, 
    15543, 15543, 15543, 15543, 15543, 15543, 15543, 15543, 15543, 
    15543, 15543, 15544, 15544, 15544, 15544, 15544, 15544, 15544, 
    15544, 15544, 15544, 15544, 15544, 15544, 15544, 15544, 15544, 
    15544, 15544, 15544, 15544, 15544, 15545, 15545, 15545, 15545, 
    15545, 15545, 15545, 15545, 15545, 15545, 15545, 15545, 15545, 
    15545, 15545, 15545, 15545, 15546, 15546, 15546, 15546, 15546, 
    15546, 15546, 15546, 15546, 15546, 15546, 15546, 15546, 15546, 
    15546, 15546, 15546, 15547, 15547, 15547, 15547, 15547, 15547, 
    15547, 15547, 15547, 15547, 15547, 15547, 15547, 15547, 15547, 
    15547, 15547, 15547, 15547), class = "Date"), bucket = c(800L, 
    900L, 1000L, 1100L, 1200L, 1300L, 1400L, 1500L, 1600L, 1700L, 
    1800L, 1900L, 2000L, 2100L, 2200L, 2300L, 2400L, 800L, 900L, 
    1000L, 1100L, 1200L, 1300L, 1400L, 1500L, 1600L, 1700L, 1800L, 
    1900L, 2000L, 2100L, 2200L, 900L, 1000L, 1100L, 1200L, 1300L, 
    1400L, 1500L, 1600L, 1700L, 1800L, 1900L, 2000L, 2100L, 2200L, 
    2300L, 2400L, 2500L, 2600L, 2800L, 800L, 900L, 1000L, 1100L, 
    1200L, 1300L, 1400L, 1500L, 1600L, 1700L, 1800L, 1900L, 2000L, 
    2100L, 2200L, 2300L, 2400L, 2500L, 2600L, 2700L, 2800L, 800L, 
    900L, 1000L, 1100L, 1200L, 1300L, 1400L, 1500L, 1600L, 1700L, 
    1800L, 1900L, 2000L, 2100L, 2200L, 2300L, 2400L, 800L, 900L, 
    1000L, 1100L, 1200L, 1300L, 1400L, 1500L, 1600L, 1700L, 1800L, 
    1900L, 2000L, 2100L, 2200L, 2300L, 2400L, 1300L, 1400L, 1500L, 
    1600L, 1700L, 1800L, 1900L, 2000L, 2100L, 2200L, 2300L, 2400L, 
    2500L, 2600L, 2700L, 2800L, 2900L, 3000L, 3200L), cnt = c(119L, 
    123L, 173L, 226L, 284L, 257L, 268L, 244L, 191L, 204L, 187L, 177L, 
    164L, 125L, 140L, 109L, 103L, 123L, 165L, 237L, 278L, 338L, 306L, 
    316L, 269L, 271L, 241L, 188L, 174L, 158L, 153L, 132L, 154L, 241L, 
    246L, 300L, 305L, 301L, 292L, 253L, 251L, 214L, 189L, 179L, 159L, 
    161L, 144L, 139L, 132L, 136L, 105L, 120L, 156L, 209L, 267L, 299L, 
    316L, 318L, 307L, 295L, 273L, 283L, 229L, 192L, 193L, 170L, 164L, 
    154L, 138L, 101L, 115L, 103L, 105L, 156L, 220L, 255L, 308L, 338L, 
    318L, 255L, 278L, 260L, 235L, 230L, 185L, 145L, 147L, 157L, 109L, 
    104L, 191L, 201L, 238L, 223L, 229L, 286L, 256L, 240L, 233L, 202L, 
    180L, 184L, 161L, 125L, 110L, 101L, 132L, 117L, 124L, 154L, 167L, 
    137L, 169L, 175L, 168L, 188L, 137L, 173L, 164L, 167L, 115L, 116L, 
    118L, 125L, 104L)), .Names = c("date", "bucket", "cnt"), 
    class = "data.frame", row.names = c(NA, -125L))

Plotting code:

library(ggplot2)

plot_1 = ggplot(dat, aes(x=date, y=bucket, fill=cnt)) +
         geom_tile() +
         scale_fill_continuous(low="#F7FBFF", high="#2171B5") +
         theme_bw()

ggsave("plot_1.png", plot_1, width=6, height=4)

enter image description here The plot might look better if you include rows for zero bucket values in your data. Then you could change low="#F7FBFF" to low="white".

bdemarest
  • 14,397
  • 3
  • 53
  • 56
4

Put your numbers in a matrix and use 'image(mat)'? That looks to be all it is. A grid. A raster. Or am I missing something?

There's also ways to do this with ggplot, raster, and probably others.

Spacedman
  • 92,590
  • 12
  • 140
  • 224