0

UPDATED: An example of the problem is shown below the code for the app

I'm building an dynamic ML app where the user can upload a dataset to get a prediction of the first column in the dataset (the response variable should be located in column 1 of the uploaded dataset). The user can select a value for the variables in the uploaded dataset and get a prediction of the response variable.

I'm currently trying to create a datatable that stores all the selected values, timestamp and the prediction.

The table is suppose to store the previous saved values, but only for that perticular dataset. By this I mean that if I save values from the iris dataset, the table uses the variables from the iris dataset as columns. This causes problems when uploading another dataset and saving those values, since the columns from the iris dataset would still be there and not the variables/columns from the new dataset.

My question is: How do I create a unique datatable for each dataset uploaded to the app?

If this sound confusion, try to run the app, calculate a prediction and save the data. Do this for two different datasets and look at the datatable under the "log" tab.

If you don't have two datasets, you can use these two datasets, they are build into R as default and already have the response variable positioned in column 1.

write_csv(attitude, "attitude.csv")
write_csv(ToothGrowth, "ToothGrowth.csv")

You will find the code regarding the datatable under the 'Create the log' section in the server function.

This is the code for the app:

library(shiny)
library(tidyverse)
library(shinythemes)
library(data.table)
library(RCurl)
library(randomForest)
library(mlbench)
library(janitor)
library(caret)
library(recipes)
library(rsconnect)



# UI -------------------------------------------------------------------------
ui <- fluidPage(
  navbarPage(title = "Dynamic ML Application",
               
    tabPanel("Calculator", 
  
            sidebarPanel(
              
              h3("Values Selected"),
              br(),
              tableOutput('show_inputs'),
              hr(),
              actionButton("submitbutton", label = "calculate", class = "btn btn-primary", icon("calculator")),
              actionButton("savebutton", label = "Save", icon("save")),
              hr(),
              tableOutput("tabledata")
              ), # End sidebarPanel
            
            mainPanel(
              
              h3("Variables"),
              uiOutput("select")
              ) # End mainPanel
            
  ), # End tabPanel Calculator
  
          
  tabPanel("Log",
           br(),
           DT::dataTableOutput("datatable18", width = 300), 
  ), # End tabPanel "Log"
  
  tabPanel("Upload file",
           br(),
        sidebarPanel(
           fileInput(inputId = "file1", label="Upload file"),
           checkboxInput(inputId ="header", label="header", value = TRUE),
           checkboxInput(inputId ="stringAsFactors", label="stringAsFactors", value = TRUE),
           radioButtons(inputId = "sep", label = "Seperator", choices = c(Comma=",",Semicolon=";",Tab="\t",Space=" "), selected = ","),
           radioButtons(inputId = "disp", "Display", choices = c(Head = "head", All = "all"), selected = "head"),

        ), # End sidebarPanel
        
        mainPanel(
           tableOutput("contents")
        )# End mainPanel
  ) # EndtabPanel "upload file"
  
  
  ) # End tabsetPanel
) # End UI bracket


# Server -------------------------------------------------------------------------
server <- function(input, output, session) {
  

# Upload file content table
  get_file_or_default <- reactive({
    if (is.null(input$file1)) {
      paste("No file is uploaded yet")
    } else { 
      df <- read.csv(input$file1$datapath,
                     header = input$header,
                     sep = input$sep,
                     quote = input$quote)
      
      if(input$disp == "head") {
        return(head(df))
      }
      else {
        return(df)
      }
    }
  })
  output$contents <- renderTable(get_file_or_default())
  
  
# Create input widgets from dataset  
  output$select <- renderUI({
    req(input$file1)
    if (is.null(input$file1)) {
      "No dataset is uploaded yet"
    } else {
      df <- read.csv(input$file1$datapath,
               header = input$header,
               sep = input$sep,
               quote = input$quote)
    
      tagList(map(
      names(df[-1]),
      ~ ifelse(is.numeric(df[[.]]),
               yes = tagList(sliderInput(
                 inputId = paste0(.),
                 label = .,
                 value = mean(df[[.]], na.rm = TRUE),
                 min = round(min(df[[.]], na.rm = TRUE),2),
                 max = round(max(df[[.]], na.rm = TRUE),2)
               )),
               no = tagList(selectInput(
                 inputId = paste0(.),
                 label = .,
                 choices = sort(unique(df[[.]])),
                 selected = sort(unique(df[[.]]))[1],
               ))
      ) # End ifelse
    )) # End tagList
    }
  })
  
  
# creating dataframe of selected values to be displayed
  AllInputs <- reactive({
    req(input$file1)
    if (is.null(input$file1)) {
      
    } else {
      DATA <- read.csv(input$file1$datapath,
                     header = input$header,
                     sep = input$sep,
                     quote = input$quote)
    }
    id_exclude <- c("savebutton","submitbutton","file1","header","stringAsFactors","input_file","sep","contents","head","disp")
    id_include <- setdiff(names(input), id_exclude)
    if (length(id_include) > 0) {
          myvalues <- NULL
      for(i in id_include) {
        if(!is.null(input[[i]]) & length(input[[i]] == 1)){
          myvalues <- as.data.frame(rbind(myvalues, cbind(i, input[[i]])))
        }
      }
      names(myvalues) <- c("Variable", "Selected Value")
      myvalues %>% 
        slice(match(names(DATA[,-1]), Variable))
    }
  })
  
  
# render table of selected values to be displayed
  output$show_inputs <- renderTable({
    if (is.null(input$file1)) {
    paste("No dataset is uploaded yet.")
    } else {
    AllInputs()
    }
  })
  
  
# Creating a dataframe for calculating a prediction
  datasetInput <- reactive({ 
    req(input$file1)
    DATA <- read.csv(input$file1$datapath,
                     header = input$header,
                     sep = input$sep,
                     quote = input$quote)
    
    DATA <- as.data.frame(unclass(DATA), stringsAsFactors = TRUE)
    response <- names(DATA[1])
    model <- randomForest(eval(parse(text = paste(names(DATA)[1], "~ ."))), 
                          data = DATA, ntree = 500, mtry = 3, importance = TRUE)
    
    df1 <- data.frame(AllInputs(), stringsAsFactors = FALSE)
    input <- transpose(rbind(df1, names(DATA[1])))
    
    write.table(input,"input.csv", sep=",", quote = FALSE, row.names = FALSE, col.names = FALSE)
    test <- read.csv(paste("input.csv", sep=""), header = TRUE)
    
    
# Defining factor levels for factor variables
    cnames <- colnames(DATA[sapply(DATA,class)=="factor"])
    if (length(cnames)>0){
      lapply(cnames, function(par) {
        test[par] <<- factor(test[par], levels = unique(DATA[,par]))
      })
    }
    
# Making the actual prediction and store it in a data.frame     
    Prediction <- predict(model,test)
    Output <- data.frame("Prediction"=Prediction)
    print(format(Output, nsmall=2, big.mark=","))
    
  })
  
# display the prediction when the submit button is pressed
  output$tabledata <- renderTable({
    if (input$submitbutton>0) { 
      isolate(datasetInput()) 
    } 
  })


  # -------------------------------------------------------------------------

# Create the Log 
  saveData <- function(data) {
    data <- as.data.frame(t(data))
    if (exists("datatable18")) {
      datatable18 <<- rbind(datatable18, data)
    } else {
      datatable18 <<- data
    }
  }
  
  loadData <- function() {
    if (exists("datatable18")) {
      datatable18
    }
  }
  
# Whenever a field is filled, aggregate all form data
  formData <- reactive({
    DATA <- read.csv(input$file1$datapath,
                     header = input$header,
                     sep = input$sep,
                     quote = input$quote)
    fields <- c(colnames(DATA[,-1]), "Timestamp", "Prediction")
    data <- sapply(fields, function(x) input[[x]])
    data$Timestamp <- as.character(Sys.time())
    data$Prediction <- as.character(datasetInput())
    data
  })
  
# When the Submit button is clicked, save the form data
  observeEvent(input$savebutton, {
    saveData(formData())
  })
  
# Show the previous responses
# (update with current response when Submit is clicked)
  output$datatable18 <- DT::renderDataTable({
    input$savebutton
    loadData()
  })
  


  
} # End server bracket

# ShinyApp -------------------------------------------------------------------------
shinyApp(ui, server)

UPDATED HERE

To get an idea about how the problem occurs take a look at this:

  1. I upload the iris dataset to the application. iris dataset gets uploaded

  2. I then make some predictions and save them. enter image description here

  3. The predictions, as well as the selected inputs and a timestamp of when the save-button was pressed can now be seen under the "Log" tab. Saved inputs from iris dataset

  4. I upload a new dataset (attitude), which of course have different variables included (attitude dataset has 7 variables total, iris dataset has 5). enter image description here

  5. I calculate a prediction, hit the save button and the app crashes. This happens because the number of columns in the dataset now has changed, so I get this errormessage:

Error in rbind: numbers of columns of arguments do not match

This can be fixed by renaming the datatable object in the server, since this creates a new datatable without any specified columns yet. But as soon as the Save button is pressed for the first time, the datatable locks-in the columns so they can't be changed again.

I can still access the old datatables if I switch the name of the datatable in the server function back the original name. So I'm thinking that if the name of the datatable object can be dynamic dependend on the dataset uploaded to the app, then the correct datatable can be shown.

So I think a better question could be: How do I create a dynamic/reactive datatable output object

Sound
  • 85
  • 1
  • 7
  • (1) this is the wrong use of `require`, see https://stackoverflow.com/a/51263513/3358272. (2) using the real file names renders this unpredictable (useless) if more than one person, I suggest using `tempfile(fileext=".csv")` for storing them; even better, consider sqlite or duckdb for a trivial or even in-memory database, both work great for this type of use. – r2evans Nov 30 '20 at 13:59
  • Oh, I forgot to remove the req() after messing around with potential solutions. I'm pretty new to creating databases and tempfiles, do you have a link on how to use them? or could you show me how it should be implemented in my code? – Sound Nov 30 '20 at 14:12
  • A good one is https://shiny.rstudio.com/articles/persistent-data-storage.html. It discusses several options: file-based, DBMS, and non-relational "database" (e.g., mongodb, redis). If this is low-user-count, I'd start small. – r2evans Nov 30 '20 at 17:20
  • To be clear, I'm commenting about your use of `require(.)`, not `req(.)`, two very different things. (On a cursory look, your use of `req(.)` looks appropriate.) – r2evans Nov 30 '20 at 19:04
  • I used the `require()` function because it temporarily fixed a problem which has now been fixed, so I've changed all back to `library()` again. Thanks for the link to the article! My current datatable log is actually already inspired by that article, but I don't seem to remember the article talking about the specific issue I describe in the post. The problem I'm having is that the table tries to store datainputs from new datasets in the old one containing the old columns. I'm trying to create an unique datatable for each dataset uploaded to the app. Is this mentioned somewhere in the article? – Sound Dec 01 '20 at 07:52
  • A potential solution could be to save the datatables as different names depending on the filename. Right now the datatable is just called **datatable15**. But if I could somehow save the datatable as `output$datatable_iris` when the iris dataset is uploaded and something else when a different dataset is uploaded, it might work. However, I don't know how to use the `paste()` function to save directly to an output object like `output$datatable15`. – Sound Dec 01 '20 at 08:16

1 Answers1

0

Here's a simple shiny app that demonstrates a technique of storing a list of data (and properties). I'll store it in alldata (a reactive-value), and each dataset has the following properties:

  • name, just the name, redundant with the name of the list itself
  • depvar, stored dependent-variable, allowing the user to select which of the variables is used; in the displayed table, this is shown as the first column, though the original data is in its original column-order
  • data, the raw data (data.frame)
  • created and modified, timestamps; you said timestamps, but I didn't know if you meant on a particular dataset/prediction/model or something else, so I did this instead

Note that the same data can be uploaded multiple times: while I don't know if this is needed, it is allowed since all referencing is done on the integer index within the alldata list, not the names therein.

library(shiny)

NA_POSIXt_ <- Sys.time()[NA] # for class-correct NA
defdata <- list(
  mtcars = list(
    name = "mtcars",
    depvar = "mpg",
    data = head(mtcars, 10),
    created = Sys.time(),
    modified = NA_POSIXt_
  ),
  CO2 = list(
    name = "CO2",
    depvar = "uptake",
    data = head(CO2, 20),
    created = Sys.time(),
    modified = NA_POSIXt_
  )
)

makelabels <- function(x) {
  out <- mapply(function(ind, y) {
    cre <- format(y$created, "%H:%M:%S")
    mod <- format(y$modified, "%H:%M:%S")
    if (is.na(mod)) mod <- "never"
    sprintf("[%d] %s (cre: %s ; mod: %s)", ind, y$name, cre, mod)
  }, seq_along(x), x)
  setNames(seq_along(out), out)
}

ui <- fluidPage(
  sidebarLayout(
    sidebarPanel(
      selectInput("seldata", label = "Selected dataset", choices = makelabels(defdata)),
      selectInput("depvar", label = "Dependent variable", choices = names(defdata[[1]]$data)),
      hr(),
      fileInput("file1", label = "Upload data"),
      textInput("filename1", label = "Data name", placeholder = "Derive from filename"),
      checkboxInput("header", label = "Header", value = TRUE),
      checkboxInput("stringsAsFactors", label = "stringsAsFactors", value = TRUE),
      radioButtons("sep", label = "Separator",
                   choices = c(Comma = ",", Semicolon = ";", Tab = "\t", Space = " "),
                   select = ","),
      radioButtons("quote", label = "Quote",
                   choices = c(None = "", "Double quote" = '"', "Single quote" = "'"),
                   selected = '"')
    ),
    mainPanel(
      tableOutput("contents")
    )
  )
)

server <- function(input, output, session) {
  alldata <- reactiveVal(defdata)

  observeEvent(input$seldata, {
    dat <- alldata()[[ as.integer(input$seldata) ]]
    choices <- names(dat$data)
    selected <- 
      if (!is.null(dat$depvar) && dat$depvar %in% names(dat$data)) {
        dat$depvar
      } else names(dat$data)[1]
    updateSelectInput(session, "depvar", choices = choices, selected = selected)
    # ...
    # other things you might want to update when the user changes dataset
  })
  observeEvent(input$depvar, {
    ind <- as.integer(input$seldata)
    alldat <- alldata()
    if (alldat[[ ind ]]$depvar != input$depvar) {
      # only update alldata() when depvar changes
      alldat[[ ind ]]$depvar <- input$depvar
      alldat[[ ind ]]$modified <- Sys.time()
      lbls <- makelabels(alldat)
      sel <- as.integer(input$seldata)
      updateSelectInput(session, "seldata", choices = lbls, selected = lbls[sel])
      alldata(alldat)
    }
  })

  observeEvent(input$file1, {
    req(input$file1)
    df <- tryCatch({
      read.csv(input$file1$datapath,
               header = input$header, sep = input$sep,
               stringsAsFactors = input$stringsAsFactors,
               quote = input$quote)
    }, error = function(e) e)
    if (!inherits(df, "error")) {
      if (!NROW(df) > 0 || !NCOL(df) > 0) {
        df <- structure(list(message = "No data found"), class = c("simpleError", "error", "condition"))
      }
    }
    if (inherits(df, "error")) {
      showModal(modalDialog(title = "Error loading data", "No data was found in the file"))
    } else {
      nm <-
        if (nzchar(input$filename1)) {
          input$filename1
        } else tools:::file_path_sans_ext(basename(input$file1$name))
      depvar <- names(df)[1]
      newdat <- setNames(list(list(name = nm, depvar = depvar, data = df,
                                   created = Sys.time(), modified = NA_POSIXt_)),
                         nm)
      alldat <- alldata()
      alldata( c(alldat, newdat) )
      # update the selectInput to add this new dataset
      lbls <- makelabels(alldata())
      sel <- length(lbls)
      updateSelectInput(session, "seldata", choices = lbls, selected = lbls[sel])
    }
  })

  output$contents <- renderTable({
    req(input$seldata)
    seldata <- alldata()[[ as.integer(input$seldata) ]]
    # character
    depvar <- seldata$depvar
    othervars <- setdiff(names(seldata$data), seldata$depvar)
    cbind(seldata$data[, depvar, drop = FALSE], seldata$data[, othervars, drop = FALSE])
  })
}

shinyApp(ui, server)

There is no ML, no modeling, nothing else in this shiny app, it just shows one possible method for switching between multiple datasets.

sample shiny app

For your functionality, you'll need to react to input$seldata to find when the user changes dataset. Note that (1) I'm returning the integer of the list index, and (2) selectInput always returns a string. From this, if the user selects the second dataset in the pull-down, you will get "2", which will obviously not index by itself. Your data must be referenced as alldata()[[ as.integer(input$seldata) ]].

To support repeated-data with less ambiguity, I added the timestamps to the selectInput text, so you can see the "when" of some data. Perhaps overkill, easily removed.

select input augmented labels

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    That's a great example of how to switch between multiple datasets, and a smart way of doing it! However, it is not the dataset I want to switch between, it's the datatables (the log) created from the input widgets (plus the prediction and timestamp of when the save button was pressed). I probably could have explained the problem in more detail. The thing is that if I change the name of `output$datatable15` to for example `output$datatable16` and all the places i refer to it in the code, the log works for new datasets. – Sound Dec 02 '20 at 13:15
  • So I just need a way of saving a dynamic output name if it is called that. I don't know if it is possible, but if the datatable could get saved as `output$datatable_[insert_uploaded_dataset_name]`, I think it could work. – Sound Dec 02 '20 at 13:21
  • I thought that my use of `output$contents` demonstrated a way to deal with a reactive list of frames. That is, put your predictions (step 3) into a list of frames, and when a new dataset is uploaded, switch to a now-empty frame for the datatable output. – r2evans Dec 02 '20 at 14:47
  • I think I might be missing the point, which would be due to my lack of experience with this, but I would think that there would be an eaiser method of dynamically changing the name of the `output$datatable18 <- DT::renderDataTable({ input$savebutton loadData()})` based on the `input$file1`. I'm also running into a problem when the same dataset is uploaded again, since this will create a new datatable and not the existing one. I'm looking at something similar to https://stackoverflow.com/questions/36681209/use-string-as-name-to-create-new-output-element which I'm currently trying – Sound Dec 03 '20 at 10:36
  • How many tables do you want to show at a time? If it's just one, then changing the name of the table is missing the point of the *architecture* of the UI and your app. Since I've seen nothing that suggests you show more than one table at a time, keep the name *unchanged*. The above works just as well with `DT::DTOutput` and DT::renderDT` as it does with the basic table variants, and when the data changes (e.g., number/names of columns), the datatable changes without problem. – r2evans Dec 03 '20 at 13:46