I'm just getting into sparklr (and R actually) and slowly getting the hang of figuring out what works and what doesn't. Working on converting a straight R script to work with sparklyr. I've managed to replace grepl with regexp_replace and work out the differences in the regex formats..
I'm a bit stuck on this one tho...I am loading files (spark_read_json) that can contain non UTF-8 chars, and want to remove them.
The code that works to remove these chars in straight R is
fileline <- fileline %>% mutate(text = iconv(text, "", "UTF-8", sub=" "))
.. and this doesn't work with sparlyr. Looks like iconv isn't available.
I'm not sure what alternative to use for this. Hive doesn't appear to have an equivalent. And spark_read_json doesn't have an option like read_csv has...
There's a possible regex approach here: Remove non-utf8 characters from string
But I was just wondering if there is something a little less involved already available...
Thanks