1

Starting point:

I have a dataset (tibble) which contains a lot of Variables of the same class (dbl). They belong to different settings. A variable (column in the tibble) is missing. This is the rowSum of all variables belonging to one setting.

Aim:

My aim is to produce sub data sets with the same data structure for each setting including the "rowSum"-Variable (i call it "s1").

Problem:

In each setting there are a different number of variables (and of course they are named differently). Because it should be the same structure with different variables it is a typical situation for a function.

Question:

How can I solve the problem using dplyr?

I wrote a function to

(1) subset the original dataset for the interessting setting (is working) and

(2) try to rowSums the variables of the setting (does not work; Why?).

Because it is a function for a special designed dataset, the function includes two predefined variables:

day - which is any day of an investigation period

N - which is the Number of cases investigated on this special day

Thank you for any help.

mkr.sumsetting <- function(...,dataset){

  subvars <- rlang::enquos(...)
  #print(subvars)

  # Summarize the variables belonging to the interessting setting
  dfplot <- dataset %>%
    dplyr::select(day,N,!!! subvars) %>%
    dplyr::mutate(s1 = rowSums(!!! subvars,na.rm = TRUE))

  return(dfplot)
   }
  • 1
    Please include the data to [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – neilfws May 10 '19 at 05:41

1 Answers1

1

We can change it to string with as_name and subset the dataset with [[ for the rowSums

library(rlang)
library(purrr)
library(dplyr)
mkr.sumsetting <- function(...,dataset){

  subvars <- rlang::enquos(...)
  v1 <- map_chr(subvars, as_name)
    #print(subvars)

   # Summarize the variables belonging to the interessting setting
   dfplot <- dataset %>%
     dplyr::select(day, N, !!! subvars) %>%
     dplyr::mutate(s1 = rowSums( .[v1],na.rm = TRUE))

     return(dfplot)
     }

out <- mkr.sumsetting(col1, col2, dataset = df1)
head(out, 3)
#   day  N       col1      col2          s1
#1   1 20 -0.5458808 0.4703824 -0.07549832
#2   2 20  0.5365853 0.3756872  0.91227249
#3   3 20  0.4196231 0.2725374  0.69216051

Or another option would be select the quosure and then do the rowSums

mkr.sumsetting <- function(...,dataset){

  subvars <- rlang::enquos(...)

    #print(subvars)

   # Summarize the variables belonging to the interessting setting
   dfplot <- dataset %>%
     dplyr::select(day, N, !!! subvars) %>%
     dplyr::mutate(s1 =  dplyr::select(., !!! subvars) %>%
                               rowSums(na.rm = TRUE))

     return(dfplot)
     }

mkr.sumsetting(col1, col2, dataset = df1)

data

set.seed(24) 
df1 <- data.frame(day = 1:20, N = 20, col1 = rnorm(20),
    col2 = runif(20))
akrun
  • 874,273
  • 37
  • 540
  • 662