14

I have an app where I want to pull out values from a lookup table based on user inputs. The reference table is a statistical test, based on a calculation that'd be too slow to do for all the different combinations of user inputs. Hence, a lookup table for all the possibilities.

But... right now the table is about 60 MB (as .Rdata) or 214 MB (as .csv), and it'll get much larger if I expand the possible user inputs. I've already reduced the number of significant figures in the data (to 3) and removed the row/column names.

Obviously, I can preload the lookup table outside the reactive server function, but it'll still take a decent chunk of time to load in that data. Does anyone have any tips on dealing with large amounts of data in Shiny? Thanks!

Laura Hughes
  • 251
  • 3
  • 11
  • Have you tried `fread()` or `readRDS()`? I wonder if they make a difference for you. – jazzurro Sep 04 '14 at 06:42
  • If jazzurro's suggestion is still too slow you can consider using a database. mongodb works well with R through rmongodb. This way you can lookup only what you need and it should be very fast. – Jan Stanstrup Sep 05 '14 at 13:24
  • thanks for the suggestions, jazzurro and jan. readRDS cuts the table down to 25 MB, so more manageable. i'll look into database options if the initial read is still too slow. – Laura Hughes Sep 08 '14 at 03:44

1 Answers1

9

flaneuse, we are still working with a smaller set that you but we have been experimenting with:

  1. Use rds for our data

    As @jazzurro mentioned rds above, and you seem to know how to do this, but the syntax for others is below.

    Format .rds allows you to bring in a single R object so you can rename it if needs be.

    In your prep data code, for example:

    mystorefile <- file.path("/my/path","data.rds")
    # ... do data stuff
    
    # Save down (assuming mydata holds your data frame or table)
    saveRDS(mydata, file = mystorefile)
    

    In your shiny code:

    #  Load in my data
    x <- readRDS(mystorefile)
    

    Remember to copy your data .rds file into your app directory when you deploy. We use a data directory /myapp/data and then file.path for store file is changed to "./data" in our shiny code.

  2. global.R

    We have placed our readRDS calls to load in our data in this global file (instead of in server.R before shinyServer() call), so that is run once, and is available for all sessions, with the added bonus it can be seen by ui.R.

    See this scoping explanation for R Shiny.

  3. Slice and dice upfront

    The standard daily reports use the most recent data. So I make a small latest.dt in my global.R of a smaller subset of my data. So the landing page with the latest charts work with this smaller data set to get faster charts.

    The custom data tab which uses the full.dt then is on a separate tab. It is slower but at that stage the user is more patient, and is thinking of what dates and other parameters to choose.

    This subset idea may help you.

Would be interested in what others (with more demanding data sets have tried)!

micstr
  • 5,080
  • 8
  • 48
  • 76