8

I'm building an interactive time-series heatmap in R using Plotly and Shiny. As part of this process, I'm re-coding heatmap values from continuous to ordinal format - so I have a heatmap where six colours represent specific count categories, and those categories are created from aggregated count values. However, this causes a major performance issue with the speed of the creation of heatmap using ggplotly(). I've traced it to the tooltip() function from Plotly which renders interactive boxes. Labels data from my heatmap somehow overload this function in a way that it performs very slowly, even if I just add a single label component to the tooltip(). I'm using a processed subset of COVID-19 outbreak data from Johns Hopkins CSSE repository. Here is a simplified heatmap code, which also uses The Simpsons colour theme from ggsci:

#Load packages
library(shiny)
library(plotly)
library(tidyverse)
library(RCurl)
library(ggsci)

#Read example data from Gist
confirmed <- read_csv("https://gist.githubusercontent.com/GeekOnAcid/5638e37c688c257b1c381a15e3fb531a/raw/80ba9704417c61298ca6919343505725b8b162a5/covid_selected_europe_subset.csv")

#Wrap ggplot of time-series heatmap in ggplotly, call "tooltip"  
ggplot_ts_heatmap <- confirmed %>%
  ggplot(aes(as.factor(date), reorder(`Country/Region`,`cases count`), 
             fill=cnt.cat, label = `cases count`, label2 = as.factor(date), 
             text = paste("country:", `Country/Region`))) + 
  geom_tile(col=1) +
  theme_bw(base_line_size = 0, base_rect_size = 0, base_size = 10) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),legend.title = element_blank()) +
  scale_fill_manual(labels = levels(confirmed$cnt.cat),
                    values = pal_simpsons("springfield")(7)) +
  labs(x = "", y = "")
ggplotly(ggplot_ts_heatmap, tooltip = c("text","label","label2"))

Performance improves once tooltip = c("text","label","label2") is reduced (for instance to tooltip = c("text")). Now, I know that delay is not "massive", but I'm integrating this with a Shiny app. And once it's integrated with Shiny and scaled with more data, it is really, really, really slow. I don't even show all variables in tooltip and its still slow - you can see it in the current version of the app when you click on 'confirmed' cases.

Any suggestions? I've considered alternative interactive heatmap packages like d3heatmap, heatmaply and shinyHeatmaply but all those solutions are more intended for correlation heatmaps and they lack customisation options of ggplot.

enter image description here

Geek On Acid
  • 6,330
  • 4
  • 44
  • 64
  • its a nice graphic, however, I would considering in ordering labels and colors dependending the number of cases (eg. more than 1000 cases should be the first label). Also I think that color gray and blue should be replaced by some color that let us see a sort of gradient. – Andrés González Mar 20 '20 at 01:11
  • 1
    I appreciate your answer but you're not addressing my question. This code above is just an example placeholder code to illustrate performance issue with Plotly. As I pointed in my question, you can view the prototype of app I'm making online. – Geek On Acid Mar 20 '20 at 10:26
  • 1
    Is it possible to rewrite as "pure" plotly code? Maybe the conversion from ggplot to plotly takes some time? And did you check out this [link](https://plotly-r.com/performance.html)? – SeGa Mar 23 '20 at 08:52
  • 1
    [Here](https://community.plot.ly/t/poor-javascript-heatmap-performance/3683/5) you can read about the problem with regards to the underlying plotly JS library. – ismirsehregal Mar 23 '20 at 14:57

1 Answers1

5

If you rewrite it as "pure" plotly (without the ggplotly conversion), it will be much faster. Around 3000 times even. Here's the result of a very small benchmark:

Unit: milliseconds
 expr       min        lq       mean     median        uq       max neval
    a 9929.8299 9929.8299 9932.49130 9932.49130 9935.1527 9935.1527     2
    b    3.1396    3.1396    3.15665    3.15665    3.1737    3.1737     2

The reason why ggplotly is much slower, is that it doesnt recognize the input as a heatmap and creates a scatterplot where each rectangle is drawn separately with all the necessary attributes. You can look at the resulting JSON if you wrap the result of ggplotly or plot_ly in plotly_json().

You can also inspect the object.size of the plots, where you will see that the ggplotly object is around 4616.4 Kb and the plotly-heatmap is just 40.4 Kb big.

df_colors = data.frame(range=c(0:13), colors=c(0:13))
color_s <- setNames(data.frame(df_colors$range, df_colors$colors), NULL)
for (i in 1:14) {
  color_s[[2]][[i]] <- pal_simpsons("springfield")(13)[[(i + 1) / 2]]
  color_s[[1]][[i]] <-  i / 14 - (i %% 2) / 14
}

plot_ly(data = confirmed, text = text) %>%
  plotly::add_heatmap(x = ~as.factor(date), 
                      y = ~reorder(`Country/Region`, `cases count`),
                      z = ~as.numeric(factor(confirmed$`cnt.cat`, ordered = T, 
                                             levels = unique(confirmed$`cnt.cat`))),
                      xgap = 0.5,
                      ygap = 0.5,
                      colorscale = color_s,
                      colorbar = list(tickmode='array',
                                      title = "Cases",
                                      tickvals=c(1:7),
                                      ticktext=levels(factor(x = confirmed$`cnt.cat`,
                                                             levels = unique(confirmed$`cnt.cat`),
                                                             ordered = TRUE)), len=0.5),
                      text = ~paste0("country: ", `Country/Region`, "<br>",
                                    "Number of cases: ", `cases count`, "<br>",
                                    "Category:  ", `cnt.cat`),
                      hoverinfo ="text"
  ) %>% 
  layout(plot_bgcolor='black',
         xaxis = list(title = ""),
         yaxis = list(title = ""));
SeGa
  • 9,454
  • 3
  • 31
  • 70
  • 1
    Thank you, works better in the first glance, I'm testing in Shiny as we speak. Two questions regarding your solution: (1) Any chance I could get a categorical legend of colours (like in my example and app), rather than a continuous colour legend when using `plot_ly()`? (2) Regarding your suggestion to use performance optimisers such as `toWebGL()` and `partial_bundle()` from the Plotly eBook - do you just wrap `plot_ly()` in those other functions to improve performance? – Geek On Acid Mar 23 '20 at 15:19
  • 1
    I tried several options, without success unfortunately. I'll edit my answer when I figured it out. I found another [SO question](https://stackoverflow.com/questions/42524450/using-discrete-custom-color-in-a-plotly-heatmap) that achieved that, but I didnt manage to. Also I think currently the values are wrong, although the coloring looks fine. Those two calls are just piped together, so you would add them add the end, but it will not always work, especially if you have other plotly plots in your Shiny App. – SeGa Mar 23 '20 at 15:38
  • I updated my answer. I think now the values and the tooltips are correct, but the legend is still continuous. – SeGa Mar 23 '20 at 18:56
  • Another update, now with discrete legend. You have to define a custom colorscale. Puh, that was tough :D – SeGa Mar 23 '20 at 19:06
  • Forget about `partial_bundle` and `toWebGL`. That doesnt work with heatmaps and just ruins your plot. But it should be fast enough now anyway. Plus now with correct ordering of the legend/colors. ;) – SeGa Mar 23 '20 at 19:43
  • 1
    Great work on finding the key issue here. I've integrated your code with my Shiny app and it greatly improved the performance. Thank you - this reputation is well deserved. – Geek On Acid Mar 23 '20 at 22:37