-1

I have working code which excludes columns based on a parameter and mutates certain columns based on other parameters. There is this SO question Can dplyr package be used for conditional mutating? but it does not address conditional select

Is there a way to have pure dplyr code without the if statements?

Working R Code:

# Loading
diamonds_tbl <- diamonds
head(diamonds_tbl)

# parameters
initialColumnDrop <-  c('x','y','z')
forceCategoricalColumns <- c('carat','cut', 'color')
forceNumericalColumns <- c('')

# Main Code
if(length(which(colnames(diamonds_tbl) %in% initialColumnDrop))>=1){
    diamonds_tbl_clean <- diamonds_tbl %>%
    select(-one_of(initialColumnDrop))  #Drop specific columns in columnDrop
}

if(length(which(colnames(diamonds_tbl_clean) %in% forceCategoricalColumns))>=1){
    diamonds_tbl_clean <- diamonds_tbl_clean %>%
    mutate_at(forceCategoricalColumns,funs(as.character)) #Force columns to be categorical
}

if(length(which(colnames(diamonds_tbl_clean) %in% forceNumericalColumns))>=1){
    diamonds_tbl_clean <- diamonds_tbl_clean %>%
    mutate_at(forceNumericalColumns,funs(as.numeric)) #Force columns to be numeric
}
dww
  • 30,425
  • 5
  • 68
  • 111
amitkb3
  • 303
  • 4
  • 14
  • 3
    What would be the advantages of "pure dplyr"? dplyr is meant to help with transformation, but where you seem to be using the `if` statement for control flow which is very different and exactly what `if` statements are meant for. – MrFlick Feb 08 '19 at 21:10
  • 1> i am not after pure dplyr but want to check if "If" statements are most efficient. 2> "If" statements are to protect against the parameters like"forceCategoricalColumns" being empty in which case the code would fail. – amitkb3 Feb 08 '19 at 22:29
  • why the down voting? Down voting is fine but at least explain your rationale. – amitkb3 Feb 08 '19 at 23:05

1 Answers1

2

I don't really understand the desire for a "pure dplyr" solution, but you can make any problem easier with helper functions. For example you could write a a function to run a transformation only if certain columns are found

run_if_cols_match <- function(data, cols, expr) {
  if (any(names(data) %in% cols)) {
    expr(data)
  } else {
    data
  }
}

Then you could use that in a pipe

diamonds_tbl_clean  <- diamonds_tbl %>% 
  run_if_cols_match(initialColumnDrop, 
        . %>% select(-one_of(initialColumnDrop))) %>% 
  run_if_cols_match(forceCategoricalColumns, 
        . %>% mutate_at(forceCategoricalColumns,funs(as.character))) %>% 
  run_if_cols_match(forceNumericalColumns, 
        . %>% mutate_at(forceNumericalColumns,funs(as.numeric)))

which would do the same thing as your code. Here just just conditionally run different anonymous pipes.

MrFlick
  • 195,160
  • 17
  • 277
  • 295