R noob trying to learn how to clean data with dplyr. I wrote code that cleans up some data and then compares the relationship between one particular variable of interest (level of education) to 4 other variables. So, I want a single function that cleans the data and returns a tibble that shows the comparison of each of the 4 variables to level of education.
My original code has no problem doing this but as soon as I try to generalize what I'm doing into a function, the select() function from the dplyr package doesn't work. It returns this error message:
Error in `dplyr::select()`:
! object 'B7_b' not found
Run `rlang::last_error()` to see where the error occurred.
Where B7_b is just one of the variables in the dataframe I'm trying to compare to level of education (ppeducat).
Here's the original code that works no problem:
# preparing df of var_4 and doing some renaming
var_4 <- shed %>%
select(B7_b,ppeducat) %>%
filter_if(is.numeric, all_vars(. > 0)) %>%
mutate(B7_b = replace(B7_b, B7_b == 1, '1 - Poor')) %>%
mutate(B7_b = replace(B7_b, B7_b == 2, '2 - Only fair')) %>%
mutate(B7_b = replace(B7_b, B7_b == 3, '3 - Good')) %>%
mutate(B7_b = replace(B7_b, B7_b == 4, '4 - Excellent')) %>%
mutate(ppeducat = replace(ppeducat, ppeducat == 1, '1 - less than high school')) %>%
mutate(ppeducat = replace(ppeducat, ppeducat == 2, '2 - high school')) %>%
mutate(ppeducat = replace(ppeducat, ppeducat == 3, '3 - some college')) %>%
mutate(ppeducat = replace(ppeducat, ppeducat == 4, '4 - bachelors or better'))
var_4
ggplot(var_4, aes(x=ppeducat, fill=B7_b)) + geom_bar() + ggtitle("How would you rate the economic conditions of your country today?") + theme(plot.title=element_text(size=10))
# sort by rel. freq and rename the columns
var_4_clean <- var_4 %>%
group_by(ppeducat,B7_b) %>%summarize(n=n()) %>%mutate(Perc_Freq= n/sum(n)*100) %>%
rename(Agreement_Level = B7_b,
Education_Level = ppeducat,
n_responses = n,
Percent_Freq_of_Response = Perc_Freq)
# check the df
print(var_4_clean)
And here is me trying to generalize it into a function...I got rid of the renaming because not all 4 variables needed the same adjustments in that regard. I don't see why this should change anything.
New code as function that doesn't work:
summarize_var <- function(df, group_var) {
clean_var <- df %>%
dplyr::select(group_var,ppeducat) %>%
filter_if(is.numeric, all_vars(. > 0)) %>%
group_by(ppeducat,group_var) %>%summarize(n=n()) %>%mutate(Perc_Freq= n/sum(n)*100) %>%
rename(Agreement_Level = group_var,
Education_Level = ppeducat,
n_responses = n,
Percent_Freq_of_Response = Perc_Freq)
return(clean_var)
}
summarize_var(shed, B7_b)
shed
Shed is the name of the dataframe I'm working with. This is my first function in R so go easy!