0

I have a scenario in which I must prepare multiple dataframes which will be used for joins.

These dataframes are to be formed by selecting a few columns in source. Source files are parquet based and there is an external table upon each parquet file folder.

My question is what among below two gives best performance?

Dataframe frame1 = spark.read.fomat(parquet).load(parquet-location).select(few columns here)

Dataframe frame2 = spark.sql(select few columns here from HIVEDB.Table_upon_parquet_files)

Which dataframe would build faster?? Frame1 or Frame2. If one is better than other, why?? Please explain.

thebluephantom
  • 16,458
  • 8
  • 40
  • 83
Lokesh Raju
  • 51
  • 1
  • 3

0 Answers0