2

I am having confusion on the difference of the following code in Databricks

spark.readStream.format('json')

vs

spark.readStream.format('cloudfiles').option('cloudFiles.format', 'json')

I know cloudfiles as the format would be regarded as Databricks Autoloader . In performance/function comparison , which one is better ? Anyone has some experience on that?

Thanks

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
mytabi
  • 639
  • 2
  • 12
  • 28

1 Answers1

4

There are multiple differences between these two. When you use Auto Loader you get at least, there are more things (see doc for all details):

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Just for clarity, both of these queries achieve the same function, i.e. reads new files that become available in the data source, yes? However, with Auto Loader, one can enjoy the benefits that you have mentioned, such as better performance, scalability and so on? – Minura Punchihewa Jun 29 '22 at 17:19
  • 1
    Yes. they achieve the same functionality, Autloader is just more optimized – Alex Ott Jun 29 '22 at 17:56