2

I am trying to compare spark sql vs hive context, may I know any difference, is the hivecontext sql use the hive query, while spark sql use the spark query?

Below is my code:

sc = pyspark.SparkContext(conf=conf).getOrCreate()
sqlContext = HiveContext(sc)
sqlContext.sql ('select * from table')

While sparksql:

spark.sql('select * from table')

May I know the difference of this two?

  • Possible duplicate of [What is the difference between Apache Spark SQLContext vs HiveContext?](https://stackoverflow.com/questions/33666545/what-is-the-difference-between-apache-spark-sqlcontext-vs-hivecontext) – chrisaycock Aug 21 '18 at 02:29

1 Answers1

6

SparkSession provides a single point of entry to interact with underlying Spark functionality and allows programming Spark with DataFrame and Dataset APIs. Most importantly, it curbs the number of concepts and constructs a developer has to juggle while interacting with Spark.

SparkSession, without explicitly creating SparkConf, SparkContext or SQLContext, encapsulates them within itself.

SparkSession has merged SQLContext and HiveContext in one object from Spark 2.0+.

When building a session object, for example:

val spark = SparkSession .builder() .appName("SparkSessionExample").config( "spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate()

.enableHiveSupport() provides HiveContext functions. so you will be able to access Hive tables since spark session is initialized with HiveSupport.

So, there is no difference between "sqlContext.sql" and "spark.sql", but it is advised to use "spark.sql", since spark is single point of entry for all the Spark API's.

Lakshman Battini
  • 1,842
  • 11
  • 25