4

In databricks, what's the difference between the two methods.

%sql

CREATE TEMPORARY VIEW diamonds
USING CSV
OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", 
         header "true")
OK

==================================================================

dataFrame = "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv"
val df = spark.read.format("csv").option("header","true")\
  .option("inferSchema", "true").load(dataFrame)

df.createOrReplaceTempView("diamonds")
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Shubham
  • 41
  • 1
  • 1
  • 2

1 Answers1

0

The difference is that the first (SQL version) won't work because views could be created only from other tables or views (see docs), and couldn't be created from files - to create them that you need to either use CREATE TABLE USING, like this:

create table test.test using csv 
options ('header' = 'true') 
location 'databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv'

and then do CREATE TEMPORARY VIEW diamonds_view AS SELECT ....

Or you can wrap it around select from files (although it may not work for all file types, or options may not be supported):

CREATE TEMPORARY VIEW diamonds_view AS 
SELECT * FROM csv.`databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv`

Everything is really depends on if you're doing processing using only SQL, or not. If not, the second option will give you the possibility to "export" dataframe as table for data processing.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132