9

I am trying to figure how to efficiently select columns using dplyr::select_if. The starwars data set in dplyr 0.70 is a good dataset to use for this:

> starwars
# A tibble: 87 x 13
                 name height  mass    hair_color  skin_color eye_color birth_year gender homeworld species     films  vehicles starships
                <chr>  <int> <dbl>         <chr>       <chr>     <chr>      <dbl>  <chr>     <chr>   <chr>    <list>    <list>    <list>
 1     Luke Skywalker    172    77         blond        fair      blue       19.0   male  Tatooine   Human <chr [5]> <chr [2]> <chr [2]>
 2              C-3PO    167    75          <NA>        gold    yellow      112.0   <NA>  Tatooine   Droid <chr [6]> <chr [0]> <chr [0]>
 3              R2-D2     96    32          <NA> white, blue       red       33.0   <NA>     Naboo   Droid <chr [7]> <chr [0]> <chr [0]>
 4        Darth Vader    202   136          none       white    yellow       41.9   male  Tatooine   Human <chr [4]> <chr [0]> <chr [1]>
 5        Leia Organa    150    49         brown       light     brown       19.0 female  Alderaan   Human <chr [5]> <chr [1]> <chr [0]>
 6          Owen Lars    178   120   brown, grey       light      blue       52.0   male  Tatooine   Human <chr [3]> <chr [0]> <chr [0]>
 7 Beru Whitesun lars    165    75         brown       light      blue       47.0 female  Tatooine   Human <chr [3]> <chr [0]> <chr [0]>
 8              R5-D4     97    32          <NA>  white, red       red         NA   <NA>  Tatooine   Droid <chr [1]> <chr [0]> <chr [0]>
 9  Biggs Darklighter    183    84         black       light     brown       24.0   male  Tatooine   Human <chr [1]> <chr [0]> <chr [1]>
10     Obi-Wan Kenobi    182    77 auburn, white        fair blue-gray       57.0   male   Stewjon   Human <chr [6]> <chr [1]> <chr [5]>

Now say that I would like select columns that are only integers. This works well:

library(dplyr)

starwars %>%
  select_if(is.numeric)

But what should I do if I want to select based on multiple criteria. For example maybe I want both numeric and character columns:

starwars %>%
  select_if(c(is.numeric, is.character))

Or maybe I want all numeric AND the name column:

starwars %>%
  select_if(name, is.character)

Neither of the two examples above work so I am wondering how I might accomplish what I've outlined here.

boshek
  • 4,100
  • 1
  • 31
  • 55
  • 1
    Related question and answers [here](https://stackoverflow.com/questions/39592879/r-dpylr-select-if-with-multiple-conditions) – aosmith Jun 15 '17 at 21:52

5 Answers5

5

For the first example:

starwars %>%
  select_if(function(col) {is.numeric(col) | is.character(col)})

This is taken directly from the RDocumentation page.

For the second:

toKeep <- sapply(starwars, is.numeric)
starwars %>%
  select("name", names(toKeep)[as.numeric(toKeep) == 1])

I cannot make something prettier up at the moment, but I'm sure there is a better way :)

psychOle
  • 1,054
  • 9
  • 19
  • Indeed right there on RDocumentation though this doesn't answer the all numeric AND `name` column question. Any thoughts there? – boshek Jun 15 '17 at 19:46
  • One option: starwars %>% group_by(name) %>% select_if(is.numeric). But that's a bit ugly. – boshek Jun 15 '17 at 19:59
  • Yeah, sorry, that took a little longer than I thought. See updated answer. – psychOle Jun 15 '17 at 21:07
4

From version 1.0.0, as mentioned in the news,

select() and rename() use the latest version of the tidyselect interface. Practically, this means that you can now combine selections using Boolean logic (i.e. !, & and |), and use predicate functions (e.g. is.character) to select variables by type (#4680).

### Install development version on GitHub first until CRAN version is available
# install.packages("devtools")
# devtools::install_github("tidyverse/dplyr")
library(dplyr, warn.conflicts = FALSE)

starwars %>% 
  as_tibble() %>% 
  glimpse()
#> Rows: 87
#> Columns: 14
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
#> $ films      <list> [<"The Empire Strikes Back", "Revenge of the Sith", "Re...
#> $ vehicles   <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, ...
#> $ starships  <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced ...

To select either numeric or character columns:

starwars %>%
  select(is.numeric | is.character) %>% 
  glimpse()
#> Rows: 87
#> Columns: 11
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...

Or select non-list columns

starwars %>%
  select(!is.list) %>% 
  glimpse()
#> Rows: 87
#> Columns: 11
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...

To select name & character columns

starwars %>%
  select(name | is.character) %>% 
  glimpse()
#> Rows: 87
#> Columns: 8
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...

Created on 2020-02-17 by the reprex package (v0.3.0)

Tung
  • 26,371
  • 7
  • 91
  • 115
  • if you are going to do this you likely should include install instructions until the version is on CRAN. – boshek Feb 19 '20 at 17:03
2

You can either write your own function:

 to_keep <- function(x) is.numeric(x) | is.character(x)
 starwars %>% select_if(to_keep)

or you can use "quosure-style lambda functions":

starwars %>% select_if(funs(is.numeric(.) | is.character(.)))

I don't know of a good way of combining different logic for column selection, so I'd use an hybrid approach (even if it's not very elegant as you have to repeat the initial dataset):

 starwars %>%
    select("name") %>%
    bind_cols(select_if(starwars, funs(is.numeric(.) | is.character(.))))
fmic_
  • 2,281
  • 16
  • 23
2

Elegant tidyverse syntax where ~ stands for anonymous function may be helpful when using select_if function:

require(tidyverse)

# numeric and character columns
starwars %>% select_if(~ is.numeric(.) | is.character(.)) 

# all numeric AND the name column
starwars %>% select(name, where(is.numeric))

Predicate functions e.g. is.numeric inside of select for some reason is recommended to be wrapped in where() according to tidyverse creators.

George Shimanovsky
  • 1,668
  • 1
  • 17
  • 15
0

For the second part (getting the numeric AND the name column):

to_keep <- c(starwars %>% select_if(is.numeric) %>% names,"name")
starwars %>% select(one_of(to_keep))  
PT2018
  • 31
  • 5