0

I may have missed this, but where in the documentation (in the latest spark version, 3.1.2 at the moment of posting) can we find the options that we can pass to the option method:

df.write.format('parquet').option(???).mode(saveMode='overwrite').saveAsTable('tmp')

For example, I have seen that maxRecordsPerFile is probably a valid option (https://mungingdata.com/apache-spark/partitionby/). Where can we find the whole list of options and their meaning? Do we still need to go through the source (Where is the reference for options for writing or reading per format?)?

Also, is it better to pass options through .option() or within .saveAsTable()?

karpan
  • 421
  • 1
  • 5
  • 13
  • Does this answer your question? [Spark: what options can be passed with DataFrame.saveAsTable or DataFrameWriter.options?](https://stackoverflow.com/questions/31487254/spark-what-options-can-be-passed-with-dataframe-saveastable-or-dataframewriter) – Ric S Jul 29 '21 at 12:36
  • Thanks @Ric. I found a similar answer but as you point out it is necessary to look at the code. I was hoping with some sort of documentation and some explanation. – karpan Jul 29 '21 at 13:00
  • Unfortunately there's no clear documentation about it, only specific ones for specific formats, as explicited in the question above – Ric S Jul 29 '21 at 13:02
  • Thanks @Ric. I was hoping for some progress but I understand the difficulty with creating documentation for something like this. Bye, have you found the option maxRecordsPerFile that is mentioned in the link in my question? – karpan Jul 29 '21 at 13:41

0 Answers0