0

I'm trying to understand a little more about R and came across this really good script here on Kaggle: https://www.kaggle.com/msjgriffiths/d/kaggle/sf-salaries/explore-sf-salary-data/code

I'm a beginner in R and I'm struggling to understand a portion of the code the poster used which is summarised below:

data_csv <- read_csv("../Salaries.csv", na=c("Not Provided"))
data <- data_csv
glimpse(data_csv)
non_numeric_vars <- names(data)[!sapply(data, is.numeric)]
data %>%
  select(one_of(non_numeric_vars)) %>%
summarise_each(funs(unique_vars = length(unique(.))))

The selection I'm not understanding is the funs function in the code above. If I read the R docs for dplyr, it says it needs a list of functions specified by the 3 arguments. Where are the three arguments or has it been piped in as per this thread What does %>% mean in R?

Also I tried to find the docs for unique_vars but came up with nothing. I'm not sure where I can read more about this variable?

funs {dplyr}    R Documentation Create a list of functions calls.

Description

funs provides a flexible way to generate a named list of functions for input to other functions like summarise_each.

Usage

funs(...)

funs_(dots) Arguments

dots,...     A list of functions specified by: Their name, "mean" The function itself, mean A call to the function with . as a dummy parameter, mean(., na.rm = TRUE) Examples

funs(mean, "mean", mean(., na.rm = TRUE))

# Overide default names funs(m1 = mean, m2 = "mean", m3 = mean(., na.rm = TRUE))

# If you have function names in a vector, use funs_ fs <- c("min", "max") funs_(fs)

The result after running his code is below - not sure where the unique_vars variable comes in in his results:

## Source: local data frame [1 x 6]
## 
##   EmployeeName JobTitle Benefits Notes Agency Status
##          (int)    (int)    (int) (int)  (int)  (int)
## 1       110811     2159    98648     1      1      3
Community
  • 1
  • 1
Simon
  • 19,658
  • 27
  • 149
  • 217
  • `unique_vars` is not a function; it's a parameter name the programmer is creating for a value. The programmer is just getting the number of unique values for each categorical variable. – IRTFM May 14 '16 at 18:10
  • @42, but the result doesn't show a row that says unique_vars, why name it if you cannot see it anywhere? – Simon May 14 '16 at 20:58

1 Answers1

0

funs doesn't need 3 arguments. In the documentation, where it says

A list of functions specified by: Their name, "mean" The function itself, mean A call to the function with . as a dummy parameter, mean(., na.rm = TRUE)

This is 3 different ways to format your arguments to funs. Each argument to funs is interpreted as a function.

unique_vars = length(unique(.)) labels the output of summarizing by length(unique(.)) as a column called "unique_vars".

Synergist
  • 517
  • 4
  • 20