18

I would like to select all numeric variables as well as some variables by name. I have managed to use select_if to get the numeric variables and select to get the ones by name but can't combine the two into one statement

x = data.table(c(1,2,3),c(10,11,12),c('a','b','c'),c('x','y','z'), c('l', 'm','n'))

I want my result to be:

V1 V2 V4 V5
1 10  x l
2 11  y m
3 12  z n

I tried this but it doesn't work

y = x %>%
select_if(is.numeric, V4, V5)
David
  • 301
  • 1
  • 3
  • 8
  • str(x) shows that all columns are in character. I think this is a potential reason for the error? I also wonder if you can use column names directly. – jazzurro Sep 20 '16 at 11:42
  • 2
    What's the purpose of `cbind`? Just use `data.table`. Using `cbind`, everything is coerced to a `character` matrix and so every column is `character`. If you define `x` as `data.table(c(1,2,3),c(10,11,12),c('a','b','c'),c('x','y','z'), c('l', 'm','n'))`, then `x %>% select_if(is.numeric)` works (what's the purpose of `V4` and `V5` in `select_if`?) – nicola Sep 20 '16 at 11:58
  • 3
    @nicola What makes you think OP did a mistake in his packages ? `dtplyr` encompass `data.table` as far as I can tell. Your edit conflit with author intent IMO. – Tensibai Sep 20 '16 at 12:17
  • @Tensibai Ops, I didn't know `dtplyr` and thought OP made a typo. – nicola Sep 20 '16 at 13:44
  • @nicola No problem for me. – Tensibai Sep 20 '16 at 13:58

3 Answers3

22

If we have a data frame, x:

x = data.frame(V1=c(1,2,3),V2=c(10,11,12),V3=c('a','b','c'),V4=c('x','y','z'),V5=c('l', 'm','n'), stringsAsFactors=FALSE)
##  V1 V2 V3 V4 V5
##1  1 10  a  x  l
##2  2 11  b  y  m
##3  3 12  c  z  n

where V1 and V2 are actually numeric and the rest of the columns are not factors, then we can do:

library(dplyr)
y <- x %>% select_if(function(col) is.numeric(col) | 
                                   all(col == .$V4) | 
                                   all(col == .$V5))
##  V1 V2 V4 V5
##1  1 10  x  l
##2  2 11  y  m
##3  3 12  z  n

Not saying that this is the best thing to do, but it does do what you want. The issue here is that select_if expects its function to return a boolean vector corresponding to all columns.

Another way is to use select:

y <- x %>% select(which(sapply(.,class)=="numeric"),V4,V5)
##  V1 V2 V4 V5
##1  1 10  x  l
##2  2 11  y  m
##3  3 12  z  n

which is probably better.

aichao
  • 7,375
  • 3
  • 16
  • 18
5

One option with map (from purrr)

library(purrr)
x %>%
     map2(names(x), ~.[is.numeric(.x)|.y != "V3"])  %>%
     Filter(length, .) %>% 
     bind_cols
 #     V1    V2    V4    V5
 #  <dbl> <dbl> <chr> <chr>
 #1     1    10     x     l
 #2     2    11     y     m
 #3     3    12     z     n

Or as @RoyalTS suggested

x %>% 
    imap( ~ .[is.numeric(.x)|.y != "V3"]) %>%
    keep(~length(.x) > 0) %>%
    bind_cols

As the dataset is a data.table, the option for subsetting data.table would be

x[, sapply(x, is.numeric) | colnames(x) != "V3", with = FALSE]
#   V1 V2 V4 V5
#1:  1 10  x  l
#2:  2 11  y  m
#3:  3 12  z  n

data

x <- data.table(c(1,2,3),c(10,11,12),c('a','b','c'),c('x','y','z'), 
              c('l', 'm','n')) 

NOTE: @nicola mentioned about why cbind is not required. So, we are not describing the same issues that was already raised.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • @RonakShah You are right. I could have used `colnames(x) %in% c("V4", "V5")` but I thought `"V3"` would make it short – akrun Sep 20 '16 at 12:40
  • @RonakShah Yes sir, you are right. In that case `%in% c("V4", "V5")` should be the way to go. – akrun Sep 20 '16 at 14:31
  • 1
    These days you could replace the `map2(names(x), ~.[is.numeric(.x)|.y != "V3"])` with `imap(~.[is.numeric(.x)|.y != "V3"])`. – RoyalTS Mar 19 '18 at 10:40
-1

use data.frame function:

x = data.frame(V1=c(1,2,3),V2=c(10,11,12),V3=c('a','b','c'),V4=c('x','y','z'),V5=c('l', 'm','n'))

then x %>% select_if(is.numeric) works.

ooolllooo
  • 353
  • 1
  • 3
  • 11