2

I created an R shiny application that has a dygraph based on a data table that is dynamically subsetted by a checkboxGroupInput. My problem is, when I attempt to load large amounts of data (millions of records), it loads very slowly and/or crashes.

After doing some more research, I stumbled upon a "lazy-load" technique from here. Based on my understanding, this technique essentially downsamples the data by only loading the number of data points equal to the width of the dygraph window. As the user zooms in, it will drill down and load more data within the dyRangeSelector max/min dates. I suspect this will solve my problem, because it will load significantly less data at any given dygraph interaction. However, all of the examples provided in this link were in Javascript, and I'm having trouble translating it to R.

I also attempted to treat the GraphDataProvider.js file as a dygraph plugin, but I was unable to get it to work properly.

A couple of quick notes on my implementation:

  • Each element of data_dict in the server is an xts object.
  • The do.call.cbind function call in the server is based off of this SO implementation, and it is very fast.

My current setup is essentially like this (I refactored it to make it generic):

Data Setup:

library(shiny)
library(shinydashboard)
library(dygraphs)
library(xts)
library(data.table)

start <- as.POSIXlt("2018-07-09 00:00:00","UTC")
end   <- as.POSIXlt("2018-07-11 00:00:00","UTC")
x <- seq(start, end, by=0.5)

data <- data.frame(replicate(4,sample(0:1000,345601,rep=TRUE)))
data$timestamp <- x
data <- data[c("timestamp", "X1", "X2", "X3", "X4")]
data <- as.data.table(data)

filters <- c("X1","X2","X3","X4")
data_dict <- vector(mode="list", length=4)
names(data_dict) <- filters

data_dict[[1]] <- as.xts(data[,c('timestamp','X1')]); data_dict[[2]] <- as.xts(data[,c('timestamp','X2')])
data_dict[[3]] <- as.xts(data[,c('timestamp','X3')]); data_dict[[4]] <- as.xts(data[,c('timestamp','X4')])

# Needed to quickly cbind the xts objects
do.call.cbind <- function(lst){
  while(length(lst) > 1) {
    idxlst <- seq(from=1, to=length(lst), by=2)
    lst <- lapply(idxlst, function(i) {
      if(i==length(lst)) { return(lst[[i]]) }
      return(cbind(lst[[i]], lst[[i+1]]))})}
  lst[[1]]}

UI:

header <- dashboardHeader(title = "App")
body <- dashboardBody(
        fluidRow(
            column(width = 8,
                box(
                    width = NULL,
                    solidHeader = TRUE,
                    dygraphOutput("graph")
                )
            ),
            column(width = 4,
                box(
                    width = NULL,
                    checkboxGroupInput(
                        "data_selected",
                        "Filter",
                        choices = filters,
                        selected = filters[1]
                    ),
                    radioButtons(
                        "data_format",
                        "Format",
                        choices=c("Rolling Averages","Raw"),
                        selected="Rolling Averages",
                        inline=TRUE
                    )
                )
            )
        )
)

ui <- dashboardPage(
    header,
    dashboardSidebar(disable=TRUE),
    body
)

Server:

server <- function(input, output) {
    # Reactively subsets the dataset based on checkboxGroupInput filters
    the_data <- reactive({
        data <- do.call.cbind(data_dict[input$data_selected]) # Column bind multiple xts objects
})

output$graph <- renderDygraph({
    graph <- dygraph(the_data()) %>% 
         dyRangeSelector(c("2018-07-10 00:00:00","2018-07-10 02:00:00")) %>% 
         dyOptions(useDataTimezone = TRUE,connectSeparatedPoints = TRUE)
    if(input$data_format == "Rolling Averages") graph <- graph %>% dyRoller(rollPeriod = 100)
    graph
    })
}

Make App:

shinyApp(ui, server)

I would appreciate any help I can get on this, this has stumbled me for a while now. Thank you!

fowtom
  • 83
  • 1
  • 5
  • Do you mean slow to load into R or slow for the dygraph to load? It is not usually recommended to plot millions of points on a graph. Can you subset for your graphic? – MLavoie Aug 11 '18 at 09:18
  • @MLavoie I'd assume the plot, because if I try to plot the data using `plot()` or `ggplot()` it's still slow. Each filter that I have is minimum 500,000 points, and when all 4 filters from the checkboxGroupInput are activated, it attempts to plot ~4M points. I'm trying to subset my data such that the number of points to plot is equal to the width of the dygraph window (unless it's zoomed up enough to see the raw data). I'm trying to figure out the best way to implement this. – fowtom Aug 13 '18 at 15:09
  • Have you tried to plot the same graphic outside shiny and see how fast is it? 4M is still quite a lot of points on a graph :-) – MLavoie Aug 13 '18 at 15:34
  • @MLavoie yeah I have tried plotting it outside of shiny and it is still slow. My end goal is not to plot ~4M individual points at once, I'm trying to downsample this so a much smaller subset is plotted instead. However, I should be able to view the raw data if I am zoomed in close enough. The lazy-load link I provided with the dygraph example does a good job at explaining what I'm trying to achieve. – fowtom Aug 13 '18 at 16:48
  • @MLavoie if you have any ideas implementation-wise, I'd really appreciate it. And I can elaborate if there's anything that's still unclear. Based on the research I've done, it doesn't seem like anybody has implemented this specific dygraph technique in R before. – fowtom Aug 13 '18 at 23:34
  • As it is, your example is not reproducible. The `ui` is not working. You could also add the necessary libraries to run your code. – MLavoie Aug 14 '18 at 10:46
  • @MLavoie I just updated the code so it can be executed. It actually replicates my problem well, because if you try to load too much data at once, it will likely crash. Let me know if you're unable to run it. – fowtom Aug 14 '18 at 17:50
  • I just realize that your question is similar to [my post](https://stackoverflow.com/questions/52738757/dygraphs-plugin-for-zooming-into-huge-timeseries-in-r). The idea of reloading data into dyGraphs still seems to be unsolved in R. – Egus Oct 24 '18 at 09:20

0 Answers0