1

I'm just getting into sparklr (and R actually) and slowly getting the hang of figuring out what works and what doesn't. Working on converting a straight R script to work with sparklyr. I've managed to replace grepl with regexp_replace and work out the differences in the regex formats..

I'm a bit stuck on this one tho...I am loading files (spark_read_json) that can contain non UTF-8 chars, and want to remove them.

The code that works to remove these chars in straight R is

fileline <- fileline %>% mutate(text = iconv(text, "", "UTF-8", sub=" "))

.. and this doesn't work with sparlyr. Looks like iconv isn't available.

I'm not sure what alternative to use for this. Hive doesn't appear to have an equivalent. And spark_read_json doesn't have an option like read_csv has...

There's a possible regex approach here: Remove non-utf8 characters from string

But I was just wondering if there is something a little less involved already available...

Thanks

pjatderi
  • 11
  • 1

0 Answers0