1

I have repetitive code in dplyr that cleans data.

df1_final$sumaryczna_kwota_zobowiązań  <-
  df1_final$sumaryczna_kwota_zobowiązań %>% 
  str_replace(",", ".") %>%  str_replace_all("\\s", "")%>% as.numeric() 

df3_final$sumaryczna_liczba_kontraktu_dla_produktu  <-
  df3_final$sumaryczna_liczba_kontraktu_dla_produktu %>% 
  str_replace(",", ".") %>%  str_replace_all("\\s", "")%>% as.numeric() 

df3_final$sumaryczna_kwota_kontraktu_dla_produktu  <-
  df3_final$sumaryczna_kwota_kontraktu_dla_produktu %>%
  str_replace(",", ".") %>%  str_replace_all("\\s", "") %>% as.numeric() 

df3_final$średnia_cena_produktu  <-
  df3_final$średnia_cena_produktu %>%
  str_replace(",", ".") %>%  str_replace_all("\\s", "") %>% as.numeric() 

It is one column in one df, three columns in another df, but the process is the same.

How to turn it into a function, that takes one, or better, several columns in a dataframe and cleans the data, without repeating the code?

TO MODERATOR, EXPLANATION: my question is unique in the sense it asks for several piped operations on several columns. The answers in the comments deserve promoting. From them I learned the syntax:

myfun = . %>% str_replace(",", ".") %>% str_replace_all("\\s", "")%>% as.number() 

# and then use it on columns name "a" and "b"

df %<>% mutate_at(c("a","b"), .funs=myfun)
Jacek Kotowski
  • 620
  • 16
  • 49
  • 1
    Sounds like you need to apply a function to multiple columns in your dataset. Have a look at `mutate_at` command of `dplyr` package. An example is here: https://stackoverflow.com/questions/39209987/using-functions-of-multiple-columns-in-a-dplyr-mutate-at-call.... Would also be good to post a small example of your dataset in order to get an answer that you can apply easily. – AntoniosK Dec 14 '17 at 16:01
  • 3
    First, you could do `myfun = . %>% str_replace(",", ".") %>% str_replace_all("\\s", "")%>% as.numeric()`. Second, with the magrittr package, you can use `%<>%` instead of writing the column name twice. – Frank Dec 14 '17 at 16:02
  • 1
    You can also check out how `dplyr` uses `lazyeval` and the underscore functions (e.g. `select_()`) in order to pass in strings to functions to evaluate properly in `dplyr`. I've tried this and there can be a learning curve to get used to it. – Eric Leung Dec 14 '17 at 16:37
  • 3
    Dear @Frank do you mean: myfun = . %>% str_replace(",", ".") %>% str_replace_all("\\s", "")%>% as.vector() and then df %<>% mutate_at(c("a","b"), .funs=myfun) I figured it now reading help files and your coments. – Jacek Kotowski Dec 14 '17 at 16:46
  • 1
    Yep. I don't use dplyr (only magrittr), so I'd have multiple lines like `df$a %<>% myfun; df$b %<>% myfun` instead. – Frank Dec 14 '17 at 16:52

0 Answers0