error: overloaded method value options with alternatives

Question

I am trying to save my dataFrame in s3 like below:

myDF.write.format("com.databricks.spark.csv").options(codec="org.apache.hadoop.io.compress.GzipCodec").save("s3n://myPath/myData.csv")

Then I got errors:

<console>:132: error: overloaded method value options with alternatives:
  (options: java.util.Map[String,String])org.apache.spark.sql.DataFrameWriter <and>
  (options: scala.collection.Map[String,String])org.apache.spark.sql.DataFrameWriter
 cannot be applied to (codec: String)

Does anyone know what I missed? Thanks!

score 6 · Answer 1 · answered May 23 '16 at 22:33

6

Scala is not Python. It doesn't have **kwargs. You have to provide Map:

myDF.write.format("com.databricks.spark.csv")
  .options(Map("codec" -> "org.apache.hadoop.io.compress.GzipCodec"))
  .save("s3n://myPath/myData.csv")

answered May 23 '16 at 22:33

5ba86145

61
1

Instead of saving to one myData.csv file, I actually got a myData.csv "folder", where multiple csv.gz files are stored under the folder. Is there a way just to save it to a csv file. Thanks! – Edamame May 24 '16 at 03:58
1

@Edamame You cannot have a single file [without coalescing to a single partition](http://stackoverflow.com/a/31675351/1560062) and this is basically useless unless size of the output is negligible. – zero323 May 24 '16 at 04:06
@zero323: Thanks! assuming I coalescing to a single partition, how do I save it to one csv file? Thanks! – Edamame May 24 '16 at 04:16
use repartition as mentioned in zero323's comment – Rahul Jan 21 '21 at 15:10

error: overloaded method value options with alternatives

1 Answers1