2

I have a dataset with many columns with similar names. Some of the columns have values in cents and others in dollars, e.g:

library (tidyverse) 

data<- tribble(
          ~col1_cents, ~col1,~col2_cents, ~col2,
          1000,           NA,       3000,    NA,
          NA,             20,         NA,  25.2,
          2000,           NA,       2030,    NA,
        )

For one variable, it's easy to divide the value by 100 and then assign it to the dollar variable, and delete the cent variable e.g.:

data %>% mutate( if_else(is.na(col1),
                         col1_cents/100,
                         col1) %>% 
         select(-col1_cents)

Is there a generalisable way to do this for all variables in the dataset that end in _cents? I tried this with mutate_at and ends_with but could not get it to rename to the original variable without _cents...

Thanks!

1 Answers1

4

You can use mutate_at

library(dplyr)
data %>%  mutate_at(vars(ends_with("cents")), ~./100)

# A tibble: 3 x 4
#   col1_cents  col1 col2_cents  col2
#       <dbl> <dbl>      <dbl> <dbl>
#1         10    NA       30    NA  
#2         NA    20       NA    25.2
#3         20    NA       20.3  NA  

If you then want to combine the two columns, we can use split.default to split columns based on similarity of the names, use imap_dfc from purrr along with coalesce to combine them together.

df1 <- data %>%  mutate_at(vars(ends_with("cents")), ~./100)

purrr::imap_dfc(split.default(df1, sub("_.*", "", names(df1))), 
 ~.x %>% mutate(!!.y := coalesce(.x[[2]], .x[[1]])) %>% select(.y))

#  col1  col2
#  <dbl> <dbl>
#1    10  30  
#2    20  25.2
#3    20  20.3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • That's fantastic! The bit I struggle with most is merging the two values. so once you have divided cents/100, how do you say col1 = if_else(is.na(col1),col1_cents,col1) ? Thanks. rename_all won't work because the dataframe has many column names. – Jonathan Nolan Dec 10 '19 at 09:47
  • @JonathanNolan you can use `coalesce`. See the updated answer. – Ronak Shah Dec 10 '19 at 09:52
  • That is also handy! The problem is I'm trying to write this in a function where the name and number of variables is unknown. I want to replace all of the variables that end in _cents in this way regardless of their name or the number. in this example it's col1,col1_cents,col2,col2_cents but I want it to work if the dataset happens to have a hundred variables with this pattern. – Jonathan Nolan Dec 10 '19 at 09:55
  • @JonathanNolan I see.. Can you check the updated answer now ? – Ronak Shah Dec 10 '19 at 10:10
  • Thanks! your hints helped me find this solution: https://stackoverflow.com/questions/54971918/dplyr-mutate-at-coalesce-dynamic-names-of-columns – Jonathan Nolan Dec 10 '19 at 10:18