The function of tilde (~) in dplyr conditional select

Question

Let's say I have a raw dataset like below. As a tidying process, I tried selecting columns without NA values -or removing columns with NA-, referencing this

raw_data

 #>   data_name col_a  col_b
 #>   <chr>      <int> <int>
 #> 1 data_a     30    NA
 #> 2 data_b     20    75
 #> 3 sum        50    NA

code for dropping NA columns

data_without_na <- raw_data %>% select_if(~ !any(is.na(.)))
data_without_na

output

#>   data_name col_a
#>   <chr>      <int>
#> 1 data_a     30    
#> 2 data_b     20    
#> 3 sum        50

The output is as I wanted, but I'm confused why I need tilde (~) at the beginning of the condition.

Here's what I understand so far:

tilde in R: separate the left hand side of an equation from the right hand side
!: negation
any(is.na(.)): true or false value for each column if there's any na value

How tilde works without left hand side variable?

score 2 · Accepted Answer · answered Dec 16 '22 at 14:59

In tidyverse use of ~ is equivalent of function(...). Basically, ~ converts a formula-like expression to a function. See below:

library(dplyr)

df %>% 
  select_if(function(x) !any(is.na(x)))
#>   data_name col_a
#> 1    data_a    30
#> 2    data_b    20
#> 3       sum    50

df %>% 
  select_if(~ !any(is.na(.)))
#>   data_name col_a
#> 1    data_a    30
#> 2    data_b    20
#> 3       sum    50

I can try and explain further but there are multiple threads on StackOverflow which explain this better, so I'd just refer to those:

Data:

df <- read.table(text = " data_name col_a  col_b
 data_a     30    NA
 data_b     20    75
 sum        50    NA", header = T)

The function of tilde (~) in dplyr conditional select

1 Answers1

Data: