Sparkr write DF as file csv/txt

Question

Hi I'm working on sparkR in yarn mode.

I need to write a sparkr df to a csv/txt file.

I saw that there is write.df but it writes parquet files.

I tried to do this things

RdataFrame<-collect(SparkRDF)
write.table(RdataFrame, ..)

But I got many WARN and some ERROR on contextCleaner.

Is there any way ?

zero323 · Accepted Answer · 2017-01-24T14:42:29.113

10

Spark 2.0+

You can use write.text function:

Save the content of the SparkDataFrame in a text file at the specified path. The SparkDataFrame must have only one column of string type with the name "value". Each row becomes a new line in the output file.

write.text(df, path)

or write.df with built-in SparkR csv writer:

write.df(df, path, source="csv")

Spark 1.x

You can use spark-csv package:

write.df(SparkRDF, "foo.csv", "com.databricks.spark.csv", ...)

It can be added for example with packages argument to SparkR / spark-submit:

sparkR --packages com.databricks:spark-csv_2.10:1.3.0 # For Scala 2.10
sparkR --packages com.databricks:spark-csv_2.11:1.3.0 # For Scala 2.11

For other options see the official documentation

edited Jan 24 '17 at 14:42

answered Jan 21 '16 at 11:31

zero323

322,348
103
959
935

Hey zero , there is a way to write it as one file and not part-xxx? I tried to do `repartition(A,1)` and then `write.df` but it don't works. – DanieleO Feb 20 '16 at 16:01
2

`repartition(..., 1)` should work but really don't use it. If output is small enough just collect and write locally. If not you're passing everything at least twice through a single machine. – zero323 Feb 20 '16 at 16:15
well output is like 2~3GB * 30 file and they would become too many files as part-xxx. Ill try with collect and `write.table` in R, hope it will not take so much. Thanks. – DanieleO Feb 20 '16 at 16:22
Is it normal that we lose columns names when writing with write.df ? – Orhan Yazar Mar 08 '18 at 13:27

Sparkr write DF as file csv/txt

1 Answers1

Linked