I have the below csv content
123,Out,true,2014-09-30,東芝ライフスタイル 株式会社,null,Web,1234,false,2014-09-21T22:48:28.000+0000
I loaded the CSV using spark-csv
> df <- read.df(sqlContext, "japanese_char_file.csv", source = "com.databricks.spark.csv", inferSchema = "true")
I am trying to convert this SparkR dataframe into R data.frame using
temp2bd <- SparkR::collect(df)
It gives me the below error
Error in rawToChar(string) : embedded nul in string: 'q\x9d\xe9\xa4չ\xbf\xa4\xeb\0*\017\032>'
Below is the showDF response
'+---+---+----+----------+--------------+----+---+----+-----+--------------------+
| C0| C1| C2| C3| C4| C5| C6| C7| C8| C9|
+---+---+----+----------+--------------+----+---+----+-----+--------------------+
|123|Out|true|2014-09-30|q\x9d\xe9\xa4չ\xbf\xa4\xeb\0*\017\032>|null|Web|1234|false|2014-09-21T22:48:...|
+---+---+----+----------+--------------+----+---+----+-----+--------------------+\n'
Japanese characters are converted into q\x9d\xe9\xa4չ\xbf\xa4\xeb\0*\017\032>
which seems to be causing this issue.
I came across 'Embedded nul in string' error when importing csv with fread and 'Embedded nul in string' error when importing csv with fread. But none worked for me
Do I need to change the way I am reading CSV content?
Or is there way to run sed
on Spark dataframe?
Or is it a issue in SparkR?
I am using Spark 1.5.2 in standalone mode