1

I have a dataframe where I would like to suppress certain values when they are bases on a limited number of observarions.

My dataset looks something like this:

> GROUP <- c("A", "B", "C", "D", "E", "F")
> AVERAGE <- c(100, 5, 10, 10, 5, 5)
> N_AVERAGE <- c(53, 5, 12, 20, 50, 2)
> df_average <- data.frame(GROUP , AVERAGE, N_AVERAGE)
> df_average
  GROUP AVERAGE N_AVERAGE
1     A     100        53
2     B       5         5
3     C      10        12
4     D      10        20
5     E       5        50
6     F       5         2

I would like to create a new variable, AVERAGE_new, which takes the value of "AVERAGE" when "N_AVERAGE" is >= 10. When "N_AVERAGE" is < 10 I would like the new variabele to be NA.

This was my first attempt:

funct_suppress <- function(dataset #input dataset
                           , var_goal # variable to suppress based on other variable
                           , var_N # variable used to determine whether to suppress
                           , lower_bound) # lower_bound for var_N, when value is below lower_bound, suppress var_goal
{
  dataset <- dataset %>% 
    mutate(paste0(var_goal,"_new") = ifelse((var_N < lower_bound),NA, var_goal))
}
df_average <- funct_suppress(df_average, AVERAGE, AVERAGE_nw,N_AVERAGE,10) # suppress all AVERAGE when N_AVERAGE  < 10

Obsiously, this does not work. I understand that R will not be able to interpret that var_goal / var_N are variables. So I tried the following:

> funct_suppress <- function(dataset #input dataset
+                            , var_goal # variable to suppress based on other variable
+                            , var_goal_nw # suppresses value of var_goal
+                            , var_N # variable used to determine whether to suppress
+                            , lower_bound) # lower_bound for var_N, when value is below lower_bound, suppress var_goal
+ {
+   
+   var_goal= enquo(var_goal) 
+   var_goal_nw= enquo(var_goal_nw) 
+   var_N = enquo(var_N)
+   
+   dataset <- dataset %>% 
+     mutate(var_goal = !!var_goal,
+            var_goal_nw = var_goal,
+            var_N = !!var_N,) %>% 
+     mutate(var_goal_nw = ifelse((var_N < lower_bound),NA, var_goal)) %>% 
+     select(-var_goal, -var_N)
+ }
> df_average <- funct_suppress(df_average, AVERAGE, AVERAGE_nw, N_AVERAGE,10) # suppress all AVERAGE when N_AVERAGE  < 10
> df_average
  GROUP AVERAGE N_AVERAGE var_goal_nw
1     A     100        53         100
2     B       5         5          NA
3     C      10        12          10
4     D      10        20          10
5     E       5        50           5
6     F       5         2          NA

This does work, but my new variable does not have the name that I want it to have.

How would I do this? If a function is not the most efficient way to go about this I'm open to other suggestions. However, the input variables do need to be able to change, since I need to perform this task on a number of dataframes with differing variable names.

Thank you!

user10781624
  • 141
  • 1
  • 13

2 Answers2

2

you can copy all the values then remove the ones < 10 after

df_average$AVERAGE_new <- df_average$AVERAGE
df_average$AVERAGE_new[df_average$N_AVERAGE < 10] <- NA


 df_average
  GROUP AVERAGE N_AVERAGE AVERAGE_new
1     A     100        53         100
2     B       5         5          NA
3     C      10        12          10
4     D      10        20          10
5     E       5        50           5
6     F       5         2          NA
Daniel O
  • 4,258
  • 6
  • 20
  • Thank you, but I would like to have an function or something similar to make this change. The example I have included here is very simple, but in reality I need to make a number of adjustments to my data and I need to do this for a large number of datasets / variables. So I do not want to make the changes by hand each time. – user10781624 May 25 '20 at 14:11
  • If the only problem with your code is that your new column does not have the correct name, you can add something like this at the end of your function: `colnames(dataset)[ colnames(dataset) == "var_goal_nw"] <- paste0(var_goal_nw, "_new")` – Daniel O May 25 '20 at 14:32
  • That would be the easiest way, but that does not work. I get the error that var_goal_nw does not exist, and that the "number of items to replace is not a multiple of replacement length". I've also tried using rename_ (from https://stackoverflow.com/questions/35023375/r-renaming-passed-columns-in-functions), but it also does not work.. – user10781624 May 26 '20 at 09:15
0

You could modify your function in such a way, if your dplyr version is at least 0.7:

funct_suppress <- function(dataset #input dataset
                         , var_goal # variable to suppress based on other variable
                         , var_goal_nw # suppresses value of var_goal
                         , var_N # variable used to determine whether to suppress
                         , lower_bound) # lower_bound for var_N, when value is below lower_bound, suppress var_goal
  {

         var_goal <-  enquo(var_goal) 
         var_goal_nw <-  enquo(var_goal_nw)
         var_N = enquo(var_N)
         varname <- quo_name(var_goal_nw)

           dataset %>% 
               mutate(!!varname := ifelse((!!var_N < lower_bound),NA, !!var_goal))
}

The important parts are varname <- quo_name(var_goal_nw) and !!varname :=. The other differences compared to your original function are just some minor changes to be more concise.

englealuze
  • 1,445
  • 12
  • 19