2

There are a some similar questions (like here, or here), but none with quite the answer I am looking for.

The question:

How to use select() only on columns of a certain type?

The select helper functions used in select_if() or select_at() may only reference the column name or index. In this particular case I want to select columns of a certain type (numeric) and then select a subset of them based on their column sum while not losing the columns of other types (character).

What I would like to do:

tibbly = tibble(x = c(1,2,3,4),
                y = c("a", "b","c","d"),
                z = c(9,8,7,6))

# A tibble: 4 x 3
      x y         z
  <dbl> <chr> <dbl>
1     1 a         9
2     2 b         8
3     3 c         7
4     4 d         6

tibbly %>%
     select_at(is.numeric, colSums(.) > 12)

Error: `.vars` must be a character/numeric vector or a `vars()` object, not primitive

This doesn't work because select_at() doesn't recognize is.numeric as a proper function to select columns.

If I do something like:

tibbly %>%
     select_if(is.numeric) %>%             
     select_if(colSums(.) > 12)

I manage to only select the columns with a sum > 12, but I also loose the character cholumns. I would like to avoid having to reattach the lost columns afterwards.

Is there a better way to select columns in a dplyr fashion, based on some properties other than their names / index?

Thank you!

Mojoesque
  • 1,166
  • 8
  • 15

1 Answers1

1

Perhaps an option could be to create your own custom function, and use that as the predicate in the select_if function. Something like this:

check_cond <- function(x) is.character(x) | is.numeric(x) && sum(x) > 12

tibbly %>% 
  select_if(check_cond)

  y         z
  <chr> <dbl>
1 a         9
2 b         8
3 c         7
4 d         6
Lennyy
  • 5,932
  • 2
  • 10
  • 23
  • you can also indicate that this will only work for columns that are specifically character and numeric.. eg some might be factors, which might seem to be character but will not be selected since they are basically not characters – Onyambu Aug 13 '18 at 08:44
  • Perfect, thanks! I put everything inside the select function and it works exactly as intended: tibbly %>% select_if(~ is.numeric(.) && sum(.) > 12 | is.character(.)) @Onyambu you are right that this is specific for numeric and character now. I am however trying to avoid factors at the moment and thanks to the tidyverses policies no unintended factors should appear without my consent. – Mojoesque Aug 13 '18 at 08:49
  • That's true Onyambu, if desired one could consider to extend the function such that it selects factors as well or integer columns with colsums > 12. Great it works for you, Mojoesque! – Lennyy Aug 13 '18 at 08:51