Different functions over a list of columns and generate new column names automatically with data.table

Question

I have a section in my Shiny app that generates a list.

names of the list are column names of the dataframe we will calculate on, list items contain the calculations we want

Looking to do this:
apply to all list names:
for listname (column) x calculate function n,m,o over df column x
and name the resulting column 'x.n' i.e. 'cyl.mean', 'mpg.sum'
to get a dataframe of summary statistics PER GROUP (mtcars$cyl) in this case as example

It is linked to this question, but there the example data used a separate list of column names, and apply the same functions to all those columns from another list. I'm looking to move forward to apply unique sets of functions to different columns

The list my app spits out looks like this:

mylist


$disp
[1] "sum"  "mean"

$hp
[1] "sd"

$drat
[1] "sum"  "mean"

$wt
[1] "max"

expected output:

cyl    disp.sum  hp.sd  drat.sum  drat.mean wt.max    
4        x ....  
6        x ....  
8        x  ....

The little Shiny app to create the list:

library(shiny)
library(data.table)
library(shinyjs)

Channels <- names(mtcars)[3:8]

ui <- fluidPage(
  shinyjs::useShinyjs(),
  h5('Channels', style = 'font-weight:bold'),
  uiOutput('ChannelCheckboxes'),
  h5('Statistics', style = 'font-weight:bold'),
  uiOutput('CalculationCheckboxes')

)


server <- function(input, output, session) {

  values <- reactiveValues(Statisticlist = list())
  ## build observer to deselect all sub category checkboxes if channel is deselected
  lapply(Channels, function(x) {
    observeEvent(input[[paste('Channel', x, sep = '')]], {
      if(!input[[paste('Channel', x, sep = '')]]) {
        shinyjs::disable(paste("Calculations", x, sep = ''))
        updateCheckboxGroupInput(session, inputId = paste("Calculations", x, sep = ''), selected=character(0))

      } else {
        shinyjs::enable(paste("Calculations", x, sep = ''))

        }
    })
  })

  output$ChannelCheckboxes <- renderUI({
    fluidRow(
    lapply(Channels, function(x) {
      column(2,
             checkboxInput(inputId = paste('Channel', x, sep = ''), label = x)
        )
    })
  )
  })

output$CalculationCheckboxes <- renderUI({
  fluidRow(
    lapply(Channels, function(x) {
      column(2,
             checkboxGroupInput(inputId = paste("Calculations", x, sep = ''),  label = NULL, c('sum', 'mean', 'length', 'max', 'min', 'sd')) ) })
  )
})


  lapply(Channels, function(x) {
    observe({
      req(input[[paste('Channel', x, sep = '')]])
      if(input[[paste('Channel', x, sep = '')]] & !is.null(input[[paste("Calculations", x, sep = '')]])){
     values$Statisticlist[[paste(x)]] <- input[[paste("Calculations", x, sep = "")]]

      }
    })
  })


  observeEvent(values$Statisticlist, { print(values$Statisticlist)
    mylist <<- values$Statisticlist
    })
}

shinyApp(ui, server)

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

2

If I understand correctly, the question is not about shiny in first place but about how to apply different aggregation functions to specific columns of a data.table.

The names of the columns and the functions which are to be applied on are given as list mylist which is created by the shiny app.

Among the various approaches my preferred option is to compute on the language, i.e., to create a complete expression from the contents of mylist and to evaluate it:

library(magrittr)
library(data.table)
mylist %>%
  names() %>% 
  lapply(
    function(.col) lapply(
      mylist[[.col]], 
      function(.fct) sprintf("%s.%s = %s(%s)", .col, .fct, .fct, .col))) %>% 
  unlist() %>% 
  paste(collapse = ", ") %>% 
  sprintf("as.data.table(mtcars)[, .(%s), by = cyl]", .) %>% 
  parse(text = .) %>% 
  eval()

which yields the expected result

   cyl disp.sum disp.mean    hp.sd drat.sum drat.mean wt.max
1:   6   1283.2  183.3143 24.26049    25.10  3.585714  3.460
2:   4   1156.5  105.1364 20.93453    44.78  4.070909  3.190
3:   8   4943.4  353.1000 50.97689    45.21  3.229286  5.424

The character string which is parsed is created by

mylist %>%
  names() %>% 
  lapply(
    function(.col) lapply(
      mylist[[.col]], 
      function(.fct) sprintf("%s.%s = %s(%s)", .col, .fct, .fct, .col))) %>% 
  unlist() %>% 
  paste(collapse = ", ") %>% 
  sprintf("as.data.table(mtcars)[, .(%s), by = cyl]", .)

and looks as if coded manually:

[1] "as.data.table(mtcars)[, .(disp.sum = sum(disp), disp.mean = mean(disp), hp.sd = sd(hp), drat.sum = sum(drat), drat.mean = mean(drat), wt.max = max(wt)), by = cyl]"

Data

For demonstration, mylist is provided "hard-coded":

mylist <- list(
  disp = c("sum", "mean"),
  hp = "sd",
  drat = c("sum", "mean"),
  wt = "max")

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 11 '19 at 23:24

Uwe

41,420
11
90
134

Hi Uwe, thanks for your answer! this is indeed what I was trying to build. Now trying to build it into my real App. One small follow-up question, would it be possible to turn the mylist %>% ... into a function? I'm looking to change the syntax so that mylist, mtcars and 'cyl' are replaced by input arguments like: MySummary <- function(statisticslist, dataframe, grouping) { statisticslist %>%..... etc } but not sure how that works for the sprintf line – Mark Mar 12 '19 at 08:31
Ah, seems that was easier than I thought. – Mark Mar 12 '19 at 08:37
@Mark. Great, you solved it your own. `sprintf()` seems not be as popular as `paste()` although it has advantages in cases like this one. Thanks for your feedback. – Uwe Mar 12 '19 at 11:12
Uwe, I would like to add my own functions like count, concentration and percentage to the list. For count I just add 'length' to the mylist$ID (the first column in my data, for percentage however I would like to divide group length by dataframe nrow, but I get errors that dataframe is not found. This is my formula attempt: Percentage <- function(x){ (length(x)/nrow(deparse(substitute(dataframe))))*100 } – Mark Mar 17 '19 at 23:29
@Mark, I was tempted to give a quick answer suggesting to define the function beforehand and include the name in `mylist` but I may miss your point. Computing on the language is pretty advanced stuff in R (at least for me). It is difficult for me to answer without the complete context. Therefore, I suggest that you post a new question, please. – Uwe Mar 18 '19 at 00:01
that's exactly what I tried to do, but I don't know how to access the number of rows of the entire dataframe within your answer. i.e. to get the function Percentage I define before hand divide the length of the group by the nrow of the dataframe. – Mark Mar 18 '19 at 01:52

Mark · Answer 2 · 2019-03-12T09:13:37.250

To turn Uwe's answer into a function I did this:

Summarystats <- function(statlist, dataframe, group) { 
    statlist %>%
        names() %>% 
        lapply(
            function(.col) lapply(
                statlist[[.col]], 
                function(.fct) sprintf("%s.%s = %s(%s)", .col, .fct, .fct, .col))) %>% 
        unlist() %>% 
        paste(collapse = ", ") %>% 
        sprintf("as.data.table(dataframe)[, .(%s), by = group]", .) %>% 
        parse(text = .) %>% 
        eval()
    }

Now I can call:

Summarystats(mylist, mtcars, 'cyl')

allowing me to call a summary table for whichever dataframe and grouping the user wants in my Shiny App.

Different functions over a list of columns and generate new column names automatically with data.table

2 Answers2

Data

Linked