17

I only know the version difference but do not know the functionality or else. i.e Sparksession has internally sparkcontext and conf.

gopal kulkarni
  • 171
  • 1
  • 1
  • 4

2 Answers2

31

In older version of Spark there was different contexts that was entrypoints to the different api (sparkcontext for the core api, sql context for the spark-sql api, streaming context for the Dstream api etc...) this was source of confusion for the developer and was a point of optimization for the spark team, so in the most recent version of spark there is only one entrypoint (the spark session) and from this you can get the various other entrypoint (the spark context , the streaming context , etc ....)

  • so basically if i need to use spark sql for data wrangling, I will not need to use SparkContext and SparkSession is enough rite? pardon my ignorance I am only a data analyst and a total nube in distributed computing.. – Vivek Puurkayastha Oct 10 '20 at 07:16
-5

Here an example:

colName = "name of the column on wich you want to partition the query"
lowerBound = 0L
upperBound = xxxL // this is the max count in our database
numPartitions = 5 // example
spark.read.jdbc(url,tablename,colName,lowerBound,numPartitions,props).count() // this is a count but can be any query

The count will be executed in parallel and the result will go in the 5 partitions of the rdd