0

R noob trying to learn how to clean data with dplyr. I wrote code that cleans up some data and then compares the relationship between one particular variable of interest (level of education) to 4 other variables. So, I want a single function that cleans the data and returns a tibble that shows the comparison of each of the 4 variables to level of education.

My original code has no problem doing this but as soon as I try to generalize what I'm doing into a function, the select() function from the dplyr package doesn't work. It returns this error message:

Error in `dplyr::select()`:
! object 'B7_b' not found
Run `rlang::last_error()` to see where the error occurred.

Where B7_b is just one of the variables in the dataframe I'm trying to compare to level of education (ppeducat).

Here's the original code that works no problem:

# preparing df of var_4 and doing some renaming 

var_4 <- shed %>%
  select(B7_b,ppeducat) %>%
  filter_if(is.numeric, all_vars(. > 0)) %>%
  mutate(B7_b = replace(B7_b, B7_b == 1, '1 - Poor')) %>%
  mutate(B7_b = replace(B7_b, B7_b == 2, '2 - Only fair')) %>%
  mutate(B7_b = replace(B7_b, B7_b == 3, '3 - Good')) %>%
  mutate(B7_b = replace(B7_b, B7_b == 4, '4 - Excellent')) %>%
  mutate(ppeducat = replace(ppeducat, ppeducat == 1, '1 - less than high school')) %>%
  mutate(ppeducat = replace(ppeducat, ppeducat == 2, '2 - high school')) %>%
  mutate(ppeducat = replace(ppeducat, ppeducat == 3, '3 - some college')) %>%
  mutate(ppeducat = replace(ppeducat, ppeducat == 4, '4 - bachelors or better'))

var_4

ggplot(var_4, aes(x=ppeducat, fill=B7_b)) + geom_bar() + ggtitle("How would you rate the economic conditions of your country today?") + theme(plot.title=element_text(size=10))

# sort by rel. freq and rename the columns 
var_4_clean <- var_4 %>% 
  group_by(ppeducat,B7_b) %>%summarize(n=n()) %>%mutate(Perc_Freq= n/sum(n)*100) %>%
  rename(Agreement_Level = B7_b,
         Education_Level = ppeducat,
         n_responses = n,
         Percent_Freq_of_Response = Perc_Freq)

# check the df
print(var_4_clean)

And here is me trying to generalize it into a function...I got rid of the renaming because not all 4 variables needed the same adjustments in that regard. I don't see why this should change anything.

New code as function that doesn't work:

summarize_var <- function(df, group_var) {
  clean_var <- df %>%
    dplyr::select(group_var,ppeducat) %>%
    filter_if(is.numeric, all_vars(. > 0)) %>%
    group_by(ppeducat,group_var) %>%summarize(n=n()) %>%mutate(Perc_Freq= n/sum(n)*100) %>%
    rename(Agreement_Level = group_var,
           Education_Level = ppeducat,
           n_responses = n,
           Percent_Freq_of_Response = Perc_Freq)
  return(clean_var)
}

summarize_var(shed, B7_b)
shed

Shed is the name of the dataframe I'm working with. This is my first function in R so go easy!

Kyle
  • 155
  • 2
  • 9

1 Answers1

0

I figured it out. Need to enclose tidyverse variables within {{}} if you're using them inside a function.

Kyle
  • 155
  • 2
  • 9