0

I was hoping somebody could help me with a problem I'm having creating a function. The dataset I'm using contains survey responses, with a column for each question (Q1, Q2, etc) and the responses on each row. The function has to be able to select the column (Q1, Q2, etc) and then filter from within that column for one particular response so that it can count it.

I'm trying to write a function that allows you to include the question number that you want to select as one of the arguments. Here is the code:

my_function <- function(survey, question_number) {
  selected_question <- survey %>%
    select(question_number)
  everyday_responses <- selected_question %>%
    filter(question_number == "Every day") %>%
    count()

This works for selecting the column but does not work for filtering within that column. I've worked out that this is because I have to input the question_number argument as "Q1" (with quotation marks around it). This is causing the filter(question_number == "Every day") line to not work properly, as this is expecting the column name without the " " (Q1 not "Q1").

Can anybody explain why this is happening and potentially suggest a fix? I'm fairly new to using R, so I may be missing something completely.

Many thanks in advance :D

2 Answers2

0

In general select and pull work with both raw column names (Q1) and string column names ("Q1"), but filter, mutate, ... expect raw column names.

Under the assumption you are really only interested in the number of "Every day" in your question you can do with base R:

my_function_base <- function(survey, question_number) {

  sum(survey[[question_number]] %in% "Every day")

}

my_function_base(my_df, "Q2")
# [1] 1

There are several possibilities to fix your dplyr function, but here are two options.

library(dplyr)

Using string input

my_function_str <- function(survey, question_number) {

  survey %>%
    filter_at(question_number, ~ . == "Every day") %>%
    count()
}

my_function_str(my_df, "Q2")
# A tibble: 1 x 1
#       n
#   <int>
# 1     1

filter_at works with strings as input and then filters at the specified columns.

Using NSE: See also: https://dplyr.tidyverse.org/articles/programming.html

my_function_nse <- function(survey, question_number) {
  question_number <- enquo(question_number)

  survey %>%
    filter(!!question_number == "Every day") %>%
    count()
}

my_function_nse(my_df, Q1) # No quotes around Q1

# A tibble: 1 x 1
#       n
#   <int>
# 1     2

Data

my_df <- data.frame(Q1 = c("Every week", "Every day", "Every week", "Every day"), 
                    Q2 = c("Every week", "Every week", "Every week", "Every day"))
kath
  • 7,624
  • 17
  • 32
  • Thank you so much for getting back to me so quickly! When I run the first option (even with your data), I get the error message "Error: `.vars_predicate` must be a call to `all_vars()` or `any_vars()`, not formula". I think this is something to do with the syntax for filter_at(), but can't work out why it's happening. Any idea why this isn't working for me? – James Price Oct 03 '19 at 10:12
  • Hmm, do you get the error with my example, or yours? Which version are you working on? You should be able to fix this, by specifying `filter_at(question_number, all_vars(. == "Every day"))` – kath Oct 03 '19 at 10:21
  • That works! Thank you so much again for your help :) – James Price Oct 03 '19 at 11:11
0

The link shared by @zx8754 should be able to help you fix the issue you have. Considering you saying you're new to R here is how you can modify your function.

my_function <- function(df, col) {
  df %>%
    select(col) %>%
    filter((!!as.symbol(col))=="Every day") %>%
    count()
}
# This is how you call your function
my_function(df, "Q1")

Where df your dataframe, I think it is called survey and col is the column you want to filter into.

Hope that helps.

deepseefan
  • 3,701
  • 3
  • 18
  • 31