3

I'm trying to add some interactivity to my plotly charts using crosstalk (specifically using filter_checkbox and filter_select) and I've run into a bit of snag. I produce my plots firstly through ggplot then I convert it to plotly using ggplot function.

While I can generate the chart fine (and there's plenty of interactivity on the markdown), I have a couple of problems. Firstly, when I wish to filter (either via filter_select or filter_checkbox), the 'recent' data disappears from the chart entirely, and cannot be recovered without refreshing the html. A similar thing happens with the actual data I'm filtering; I cannot revert the chart to its original state without refreshing the page.

Does anyone know why this might be? Copy of my code + data is below.

Below is a snippet of my data (data=historic):

structure(list(date = c("23-03-2019", "23-03-2019", "23-03-2019", 
"23-03-2019", "05-05-2020", "05-05-2020", "05-05-2020", "05-05-2020", 
"17-06-2021", "17-06-2021", "17-06-2021", "17-06-2021"), cumvol = c(0.004, 
0.034, 0.054, 0.057, 0.005, 0.048, 0.068, 0.075, 2.009, 2.029, 
2.049, 2.064), time = structure(c(26457, 26636, 26658, 27216, 
25152, 25614, 25667, 25668, 56966, 57268, 57303, 58986), units = "secs", class = c("hms", 
"difftime")), Year = c("2019", "2019", "2019", "2019", "2020", 
"2020", "2020", "2020", "2021", "2021", "2021", "2021"))

On top of this I plot another line from a separate df (data=recent).

structure(list(date = structure(c(19038, 19038, 19038, 19038), class = "Date"), 
cumvol = c(0.029, 0.034, 0.07, 0.075), time = structure(c(29674, 29674, 29691, 29719), 
class = c("hms", "difftime"), units = "secs")), Year = c("2022", "2022", "2022", "2022"))

I then convert the data to Shared data, create a ggplot using that data, then covert that plot to ggplot as follows (the variable "most_recent" refers the to the most recent entry in the 'recent' dataframe, produced by recent[nrow(recent),]):

sharedhistoric <- SharedData$new(historic, key = ~Date)
sharedrecent <- SharedData$new(recent, key = ~Date)

plot <- ggplot()+geom_line(data=sharedhistoric,aes(x=time, y=cumvol, group=date),color='#BAB0AC', alpha=0.5)+
      geom_line(data=sharedrecent ,aes(x=time, y=cumvol, group=date),size=1.2,color='#E15758')+
      geom_point(data=most_recent, aes(x=time,y=cumvol), color='#E15759',size=3)+geom_hline(yintercept = 0)+  theme(title=element_text(size=12),panel.background = element_rect(fill='white',color='black'),legend.position='right')+
        labs(title = "Vol",subtitle = "Cum Vol so far", x = "Time", y = "Vol")

Finally, I convert the chart to plotly and use the following bcols:

chartyplot <- plotly::ggplotly(plot)
bscols(widths = c(4, 9),
       list(
         crosstalk::filter_checkbox("Year", 
                         label = "Select Year",
                       sharedhistoric, 
                        group = ~Year),
         crosstalk::filter_select("Date", 
                       label = "Date",
                      sharedhistoric, 
                       group = ~Date)
         ), chartyplot)

Thanks for any assistance/advice.

alec22
  • 735
  • 2
  • 12

1 Answers1

1

As far as I can tell, there are two effects that contribute to this behavior

  1. (non-)unique keys of the SharedData objects
  2. anything not selected from crosstalk::filter_* is dropped from the plot

TL;DR: The way to make this work, is by ensuring unique keys, and assigning the different datasets to the same group. Any data not part of a SharedData object is lost as soon as any filter is applied. And we can cheat some data to always remain on the plot by fixing some data via HTML tags.

1 Keys

Looking at the keys section of the crosstalk documentation, the keys should be unique within the dataset. Therefore date may not be a good choice for this in the given dataset. Instead, we can simply create keys based on the row number (which is also the default behavior when no key is supplied)

sharedhistoric <- SharedData$new(historic %>% mutate(key = as.character(row_number())), key = ~key)
sharedrecent <- SharedData$new(recent %>% mutate(key = as.character(row_number())), key = ~key)

...but now only the "recent" data shows up (for year 2019, which should select rows 1-4 of the "historic" data) (left plot). Switching the order of the geom_line statements (first "recent", then "historic") leads to correct behavior for the "historic" data, but the "recent" is gone (right plot). This essentially means, that the filtered keys are only applied to the last SharedData object added to ggplot.

enter image description here

The next step to making interactive use of both datasets, is assigning them to the same group, which per documentation can be used to link multiple instances of datasets.

sharedhistoric <- SharedData$new(..., key = ~key, group = "mydata")
sharedrecent <- SharedData$new(..., key = ~key, group = "mydata")

This already looks better and we keep filtering and recovering both "historic", and "recent" data - but 2019 is now associated to the first 4 rows of both datasets (because of duplicate keys):

enter image description here

A possible workaround is to globally define unique keys and assign the sub-datasets to the same group:

historic <- as.data.frame(historic) %>%
  dplyr::mutate(date = as.character(date), 
                set = "historic")
recent <- as.data.frame(recent) %>%
  dplyr::mutate(date = as.character(date), 
                set = "recent")
all <- bind_rows(historic, recent) %>%
  dplyr::mutate(key = as.character(row_number()))

sharedall <- SharedData$new(all, key = ~key, group = "mydata")
sharedhistoric <- SharedData$new(all %>% dplyr::filter(set == "historic"), key = ~key, group = "mydata")
sharedrecent <- SharedData$new(all %>% dplyr::filter(set == "recent"), key = ~key, group = "mydata")

#--- no changes to the plot ---
# but choose filter options from full dataset

bscols(widths = c(4, 8),
       list(
         crosstalk::filter_checkbox("id_year", 
                                    label = "Select Year",
                                    sharedall, 
                                    group = ~Year),
         crosstalk::filter_select("id_date", 
                                  label = "Date",
                                  sharedall, 
                                  group = ~date)
       ),
       chartyplot)

Now we can correctly (de)select the historic and recent data.

enter image description here

2 "Lost" Data points

If the previous example had worked with filter selection only based on "historic" data, then the "recent" data (from a different year) appears to not be recoverable after the first filter is selected. Similarly for the "most_recent" data point, which is not a SharedData object. This data point is plotted initially, but is dropped as soon as the first filter is set.

However, this can be adressed the same way, by defining a "most recent" SharedData object belonging to the same group:

sharedmostrecent <- SharedData$new(all %>% tail(1), key = ~key, group = "mydata")

#plot adjustment
...
geom_line(...) +
geom_point(data=sharedmostrecent, aes(x=time,y=cumvol), color='#E15759',size=3) +
...

enter image description here

With this, we can select and deselect all data in the plot, without loosing any.

3 Fix some data

In order to ensure the "recent" data to stick around regardless of (manual filter values), we can manipulate the HTML output. First we separate the datasets by not filtering by year, but by recent/historic & year (or any other suitable subsets of the data):

all <- bind_rows(historic, recent) %>%
  dplyr::mutate(key = as.character(row_number()),
                dataset = paste0(set, " ", Year))
...
out <- bscols(widths = c(4, 8),
       list(
         crosstalk::filter_checkbox("id_year", 
                                    label = "Select dataset",
                                    sharedall, 
                                    group = ~dataset)
       ),
       chartyplot) 

Then we check and disable the checkbox for our recent dataset - this can probably be done more elegantly, but it works in my Rmd:

library(htmltools)
out_tags <- htmltools::renderTags(out)
out_tags$html <- stringr::str_replace(
  out_tags$html, 
  '<input type="checkbox" name="id_year" value="recent 2022"/>',
  '<input type="checkbox" name="id_year" value="recent 2022" disabled="disabled" checked="checked"/>'
  )
out_tags$html <- HTML(out_tags$html)
as.tags(out_tags)

enter image description here

pholzm
  • 1,719
  • 4
  • 11
  • Thanks so much for all the assistance thus far, it's been really helpful. I still have one issue though. A large portion of the most 'recent line' (including the point at the end) is missing from the chart unless today's date is selected from filter_select. Also if any of the year filters are selected (even the current year) most of the recent line disappears. Is this a grouping issue? – alec22 Mar 02 '22 at 15:48
  • Strangely, I get the same result by putting 'historic' within the bscols rather than sharedall. – alec22 Mar 02 '22 at 15:58
  • @alec22 From my (very limited) experience with this, it looks like anything that is not a selected part of the shareddata won't show up - unfortunately I didn't manage to configure this in a way that some data (say the "recent" data) always remains on the plot independent of the filters. Maybe it would be a feasible workaround to not filter by date/year, but set up a filter on datasets with a dedicated categorical column (recent/historic-2019/historic-2020/...) – pholzm Mar 02 '22 at 17:24
  • @alec22 I've added some workaround to fix parts of the dataset so they can not be deselected. Feels dirty, but maybe it helps... – pholzm Mar 03 '22 at 19:28
  • Ahh a bit more progress, but it still seems to be associating portions of the of the red line with other irrelevant years. However, you've given a lot of assistance so I think you should get my bounty :) – alec22 Mar 04 '22 at 13:42
  • 1
    Scratch that, little bit of fiddling fixed it :) – alec22 Mar 04 '22 at 13:49
  • 1
    I'm also not really satisfied with this as a general solution. But glad to hear that it at least kind of does the job for your case. :) – pholzm Mar 04 '22 at 14:19