4

Let's say I have a raw dataset like below. As a tidying process, I tried selecting columns without NA values -or removing columns with NA-, referencing this

raw_data

 #>   data_name col_a  col_b
 #>   <chr>      <int> <int>
 #> 1 data_a     30    NA
 #> 2 data_b     20    75
 #> 3 sum        50    NA

code for dropping NA columns

data_without_na <- raw_data %>% select_if(~ !any(is.na(.)))
data_without_na

output

#>   data_name col_a
#>   <chr>      <int>
#> 1 data_a     30    
#> 2 data_b     20    
#> 3 sum        50  

The output is as I wanted, but I'm confused why I need tilde (~) at the beginning of the condition.

Here's what I understand so far:

  • tilde in R: separate the left hand side of an equation from the right hand side
  • !: negation
  • any(is.na(.)): true or false value for each column if there's any na value

How tilde works without left hand side variable?

M--
  • 25,431
  • 8
  • 61
  • 93

1 Answers1

2

In use of ~ is equivalent of function(...). Basically, ~ converts a formula-like expression to a function. See below:

library(dplyr)

df %>% 
  select_if(function(x) !any(is.na(x)))
#>   data_name col_a
#> 1    data_a    30
#> 2    data_b    20
#> 3       sum    50

df %>% 
  select_if(~ !any(is.na(.)))
#>   data_name col_a
#> 1    data_a    30
#> 2    data_b    20
#> 3       sum    50

I can try and explain further but there are multiple threads on StackOverflow which explain this better, so I'd just refer to those:

  1. Tilde Dot in R (~.)
  2. Meaning of tilde and dot notation in dplyr

Data:

df <- read.table(text = " data_name col_a  col_b
 data_a     30    NA
 data_b     20    75
 sum        50    NA", header = T)
M--
  • 25,431
  • 8
  • 61
  • 93