I may have missed this, but where in the documentation (in the latest spark version, 3.1.2 at the moment of posting) can we find the options that we can pass to the option method:
df.write.format('parquet').option(???).mode(saveMode='overwrite').saveAsTable('tmp')
For example, I have seen that maxRecordsPerFile is probably a valid option (https://mungingdata.com/apache-spark/partitionby/). Where can we find the whole list of options and their meaning? Do we still need to go through the source (Where is the reference for options for writing or reading per format?)?
Also, is it better to pass options through .option()
or within .saveAsTable()
?