I don't know exactly what Databricks offers out of the box (pre-installed), but you can do some reverse-engineering using org.apache.spark.sql.execution.datasources.DataSource object that is (quoting the scaladoc):
The main class responsible for representing a pluggable Data Source in Spark SQL
All data sources usually register themselves using DataSourceRegister interface (and use shortName
to provide their alias):
Data sources should implement this trait so that they can register an alias to their data source.
Reading along the scaladoc of DataSourceRegister
you'll find out that:
This allows users to give the data source alias as the format type over the fully qualified class name.
So, YMMV.
Unless you find an authoritative answer on Databricks, you may want to (follow DataSource.lookupDataSource and) use Java's ServiceLoader.load method to find all registered implementations of DataSourceRegister
interface.
// start a Spark application with external module with a separate DataSource
$ ./bin/spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0-SNAPSHOT
import java.util.ServiceLoader
import org.apache.spark.sql.sources.DataSourceRegister
val formats = ServiceLoader.load(classOf[DataSourceRegister])
import scala.collection.JavaConverters._
scala> formats.asScala.map(_.shortName).foreach(println)
orc
hive
libsvm
csv
jdbc
json
parquet
text
console
socket
kafka
Where can I get the list of options supported for each file format?
That's not possible as there is no API to follow (like in Spark MLlib) to define options. Every format does this on its own...unfortunately and your best bet is to read the documentation or (more authoritative) the source code.