3

I recently started to write my own functions to speed up standard and repetitive task while analyzing data with R.

At the moment I'm working on a function with three arguments and ran into a challenge I could not solve yet. I would like to have an optional grouping argument. During the process the function should check if there is a grouping argument and then continue using either subfunction 1 or 2.

But I always get the error "Object not found" if the grouping argument is not NA. How can I do this?

Edit: In my case the filter usually is used to filter certain valid or invalid years. If there is a grouping argument there will follow more steps in the pipe than if there is none.

require(tidyverse)

Data <- mpg

userfunction <- function(DF,Filter,Group) {
  
  without_group <- function(DF) {
    DF %>% 
      count(year)
  }
  
  with_group <- function(DF) {
    DF %>% 
      group_by({{Group}}) %>% 
      count(year) %>% 
      pivot_wider(names_from=year, values_from=n) %>%
      ungroup() %>% 
      mutate(across(.cols=2:ncol(.),.fns=~replace_na(.x, 0))) %>% 
      mutate(Mittelwert=round(rowMeans(.[,2:ncol(.)],na.rm=TRUE),2))
  }
  
  Obj <- DF %>% 
    ungroup() %>% 
    {if(Filter!=FALSE) filter(.,eval(rlang::parse_expr(Filter))) else filter(.,.$year==.$year)} %>%
    {if(is.na(Group)) without_group(.) else with_group(.)} 
  
  return(Obj)
    
}

For NA it already works:

> Data %>% 
+   userfunction(FALSE,NA)
# A tibble: 2 x 2
   year     n
  <int> <int>
1  1999   117
2  2008   117

With argument it does not work:

> Data %>% 
+   userfunction(FALSE,manufacturer)
 Error in DF %>% ungroup() %>% { : object 'manufacturer' not found

Edit: What I would expect from the above function would be the following output:

> Data %>% userfunction_exp(FALSE,manufacturer)
# A tibble: 15 x 4
   manufacturer `1999` `2008` Mittelwert
   <chr>         <dbl>  <dbl>      <dbl>
 1 audi              9      9        9  
 2 chevrolet         7     12        9.5
 3 dodge            16     21       18.5
 4 ford             15     10       12.5
 5 honda             5      4        4.5
 6 hyundai           6      8        7  
 7 jeep              2      6        4  
 8 land rover        2      2        2  
 9 lincoln           2      1        1.5
10 mercury           2      2        2  
11 nissan            6      7        6.5
12 pontiac           3      2        2.5
13 subaru            6      8        7  
14 toyota           20     14       17  
15 volkswagen       16     11       13.5

 Data %>% userfunction_exp("cyl==4",manufacturer)
# A tibble: 9 x 4
  manufacturer `1999` `2008`  mean
  <chr>         <dbl>  <dbl> <dbl>
1 audi              4      4   4  
2 chevrolet         1      1   1  
3 dodge             1      0   0.5
4 honda             5      4   4.5
5 hyundai           4      4   4  
6 nissan            2      2   2  
7 subaru            6      8   7  
8 toyota           11      7   9  
9 volkswagen       11      6   8.5

2021-04-01 14:55: edited to add some information and add some steps to the pipe for function with_group.

thuettel
  • 165
  • 1
  • 11
  • 1
    [Don’t use `require`, since it swallows errors. Always use `library` instead](https://stackoverflow.com/a/51263513/1968). – Konrad Rudolph Apr 01 '21 at 12:33

2 Answers2

2

I don't know what is the use of Filter argument so I'll keep it as it is for now.

group_by(A) %>% count(B) is same as count(A, B) so you can change your function to :

library(tidyverse)

userfunction <- function(DF,Filter,Group = NULL) {
  DF %>% 
    count(year, {{Group}}) %>% 
    pivot_wider(names_from=year, values_from=n)
}

Data %>% userfunction(FALSE)

#   `1999` `2008`
#   <int>  <int>
#1    117    117

Data %>% userfunction(FALSE,manufacturer)
# A tibble: 15 x 3
#   manufacturer `1999` `2008`
#   <chr>         <int>  <int>
# 1 audi              9      9
# 2 chevrolet         7     12
# 3 dodge            16     21
# 4 ford             15     10
# 5 honda             5      4
# 6 hyundai           6      8
# 7 jeep              2      6
# 8 land rover        2      2
# 9 lincoln           2      1
#10 mercury           2      2
#11 nissan            6      7
#12 pontiac           3      2
#13 subaru            6      8
#14 toyota           20     14
#15 volkswagen       16     11

Note that I have assigned the default value to Group as NULL so when you don't mention anything it ignores that argument.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks, your solution would certainly work in case of my MWE. But in my use case there will follow more steps in the pipe if there is a grouping argument than if there is none. So I need to work with if else there. And if I check for NULL with is.null() I have the same problem like with is.na(). It returns "Object not found" when a column name is passed as argument. – thuettel Apr 01 '21 at 12:40
  • The filter will be used to filter for years since my datasets usually contain more than two years and sometimes years will be filtered because of too much missing data. – thuettel Apr 01 '21 at 12:41
  • Can you update your post to include some of the use cases of the function showing how it will be called and what would be corresponding expected output for it? – Ronak Shah Apr 01 '21 at 12:45
1

Hi this is a good question!

There are multiple ways to achieve this as the previous answers pointed out. One way to do it in the tidyverse is tidy evaluation

Omitting your filter function (which you could explain in more detail...)

 my_summary <- function(df, grouping_var) {
  grp_var <- enquo(grouping_var) #capture group variable
  df %>% my_group_by(grp_var)
}


my_group_by <- function(df, grouping_var){
  # Check if group is supplied 
  if(rlang::quo_is_missing(grouping_var)) {
    df %>% without_group()
  } else {
    df %>% with_group(grouping_var)
  }
  
}


without_group <- function(df) {
  # do whatever without group
  df %>% 
    count(year)
}

with_group <- function(df, grouping_var) {
  # do whatever with group
  df %>% 
    group_by(!!grouping_var) %>% #Note the !!
    count(year) %>% 
    pivot_wider(names_from=year, values_from=n)
}

Which will give you without any argument

> mpg %>% my_summary()
# A tibble: 2 x 2
   year     n
  <int> <int>
1  1999   117
2  2008   117

With group passed to pipe

> mpg %>% my_summary(model)
# A tibble: 38 x 3
# Groups:   model [38]
   model              `1999` `2008`
   <chr>               <int>  <int>
 1 4runner 4wd             4      2
 2 a4                      4      3
 3 a4 quattro              4      4
 4 a6 quattro              1      2
 5 altima                  2      4
 6 c1500 suburban 2wd      1      4
 7 camry                   4      3
 8 camry solara            4      3
 9 caravan 2wd             6      5
10 civic                   5      4
# ... with 28 more rows
SEcker
  • 407
  • 2
  • 7
  • Thanks a lot, this works! I still put it all in one function with subfunctions. I also only used the enquoted grp_var for the conditional check in the if else function and then used {{grouping_var}} in the follow up steps, since it is still available within the function. – thuettel Apr 01 '21 at 13:22