19

I use Spark 1.6.1.

We are trying to write an ORC file to HDFS using HiveContext and DataFrameWriter. While we can use

df.write().orc(<path>)

we would rather do something like

df.write().options(Map("format" -> "orc", "path" -> "/some_path")

This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library. Where can we find a reference to the options that can be passed into the DataFrameWriter? I found nothing in the docs here

https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/DataFrameWriter.html#options(java.util.Map)

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Satyam
  • 645
  • 2
  • 7
  • 20

1 Answers1

28

Where can we find a reference to the options that can be passed into the DataFrameWriter?

The most definitive and authoritative answer are the sources:

Some description you may find in the docs, but there is no single page (that could possibly be auto-generated from the sources to stay up-to-date the most).

The reason being that the options are separated from the format implementation on purpose to have the flexibility you want to offer per use case (as you duly noted):

This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library.


Your question seems similar to How to know the file formats supported by Databricks? where I said:

Where can I get the list of options supported for each file format?

That's not possible as there is no API to follow (like in Spark MLlib) to define options. Every format does this on its own...unfortunately and your best bet is to read the documentation or (more authoritative) the source code.

combinatorist
  • 562
  • 1
  • 4
  • 17
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
  • I'm trying to understand how Spark works. Could you please tell me where TextOptions is called for reading text files when using sc.read.text ? I only found reference for writing text files in TextFileFormat called from FileFormatWriter – error Sep 25 '18 at 14:56
  • new link: https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources – combinatorist Aug 09 '21 at 19:04
  • @combinatorist What is this link for? I'd like to add it to the answer but have got no idea what it serves for. Thanks. – Jacek Laskowski Aug 10 '21 at 08:47
  • @JacekLaskowski, in the first part of your answer, you have a sources link, which is the top of the repo and then a list of individual formats. The links for format sources are broken. My link shows is sort of in between those two levels. It shows where to find all the formats in the repo today. – combinatorist Aug 10 '21 at 17:08
  • 1
    NVM! Your format links aren't broken, just the one I care about (ORC). So I'll edit that only ... – combinatorist Aug 10 '21 at 17:09