I have a dataframe where I would like to suppress certain values when they are bases on a limited number of observarions.
My dataset looks something like this:
> GROUP <- c("A", "B", "C", "D", "E", "F")
> AVERAGE <- c(100, 5, 10, 10, 5, 5)
> N_AVERAGE <- c(53, 5, 12, 20, 50, 2)
> df_average <- data.frame(GROUP , AVERAGE, N_AVERAGE)
> df_average
GROUP AVERAGE N_AVERAGE
1 A 100 53
2 B 5 5
3 C 10 12
4 D 10 20
5 E 5 50
6 F 5 2
I would like to create a new variable, AVERAGE_new, which takes the value of "AVERAGE" when "N_AVERAGE" is >= 10. When "N_AVERAGE" is < 10 I would like the new variabele to be NA.
This was my first attempt:
funct_suppress <- function(dataset #input dataset
, var_goal # variable to suppress based on other variable
, var_N # variable used to determine whether to suppress
, lower_bound) # lower_bound for var_N, when value is below lower_bound, suppress var_goal
{
dataset <- dataset %>%
mutate(paste0(var_goal,"_new") = ifelse((var_N < lower_bound),NA, var_goal))
}
df_average <- funct_suppress(df_average, AVERAGE, AVERAGE_nw,N_AVERAGE,10) # suppress all AVERAGE when N_AVERAGE < 10
Obsiously, this does not work. I understand that R will not be able to interpret that var_goal / var_N are variables. So I tried the following:
> funct_suppress <- function(dataset #input dataset
+ , var_goal # variable to suppress based on other variable
+ , var_goal_nw # suppresses value of var_goal
+ , var_N # variable used to determine whether to suppress
+ , lower_bound) # lower_bound for var_N, when value is below lower_bound, suppress var_goal
+ {
+
+ var_goal= enquo(var_goal)
+ var_goal_nw= enquo(var_goal_nw)
+ var_N = enquo(var_N)
+
+ dataset <- dataset %>%
+ mutate(var_goal = !!var_goal,
+ var_goal_nw = var_goal,
+ var_N = !!var_N,) %>%
+ mutate(var_goal_nw = ifelse((var_N < lower_bound),NA, var_goal)) %>%
+ select(-var_goal, -var_N)
+ }
> df_average <- funct_suppress(df_average, AVERAGE, AVERAGE_nw, N_AVERAGE,10) # suppress all AVERAGE when N_AVERAGE < 10
> df_average
GROUP AVERAGE N_AVERAGE var_goal_nw
1 A 100 53 100
2 B 5 5 NA
3 C 10 12 10
4 D 10 20 10
5 E 5 50 5
6 F 5 2 NA
This does work, but my new variable does not have the name that I want it to have.
How would I do this? If a function is not the most efficient way to go about this I'm open to other suggestions. However, the input variables do need to be able to change, since I need to perform this task on a number of dataframes with differing variable names.
Thank you!