6

I am learning spark and scala. I am well versed in java, but not so much in scala. I am going through a tutorial on spark, and came across the following line of code, which has not been explained:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

(sc is the SparkContext instance)

I know the concepts behind scala implicits (atleast I think I know). Could somebody explain to me what exactly is meant by the import statement above? What implicits are bound to the sqlContext instance when it is instantiated and how? Are these implicits defined inside the SQLContext class?

EDIT The following seems to work for me as well (fresh code):

val sqlc = new SQLContext(sc)
import sqlContext.implicits._

In this code just above. what exactly is sqlContext and where is it defined?

Ankit Khettry
  • 997
  • 1
  • 13
  • 33

1 Answers1

5

From ScalaDoc: sqlContext.implicits contains "(Scala-specific) Implicit methods available in Scala for converting common Scala objects into DataFrames. "

And is also explained in Spark programming guide:

// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._

For example in the code below .toDF() won't work unless you will import sqlContext.implicits:

val airports = sc.makeRDD(Source.fromFile(airportsPath).getLines().drop(1).toSeq, 1)
    .map(s => s.replaceAll("\"", "").split(","))
    .map(a => Airport(a(0), a(1), a(2), a(3), a(4), a(5), a(6)))
    .toDF()

What implicits are bound to the sqlContext instance when it is instantiated and how? Are these implicits defined inside the SQLContext class?

Yes they are defined in object implicits inside SqlContext class, which extends SQLImplicits.scala. It looks there are two types of implicit conversions defined there:

  1. RDD to DataFrameHolder conversion, which enables using above mentioned rdd.toDf().
  2. Various instances of Encoder which are "Used to convert a JVM object of type T to and from the internal Spark SQL representation."
vitalii
  • 3,335
  • 14
  • 18
  • Yes, but is `sqlContext` predefined?? I just created an object `sqlContext` of the class `SQLContext` above. – Ankit Khettry Mar 09 '16 at 08:00
  • What do you mean by sqlContext predefined? – vitalii Mar 09 '16 at 08:18
  • Is there a version of `sqlContext` already defined in the Spark API, that is different from the `sqlContext` that is being defined above? – Ankit Khettry Mar 09 '16 at 08:57
  • Well, there's also HiveContext, which extends SqlContext. If you are talking about actual instances, then no, there's no other instance, and you'll have to create your own. – vitalii Mar 09 '16 at 09:04
  • I got it. SQL context is available as sqlContext in the spark shell. Just like SparkContext instance is available as sc – Ankit Khettry Mar 09 '16 at 09:26
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/105779/discussion-between-ankit-khettry-and-vitalii). – Ankit Khettry Mar 09 '16 at 09:27
  • On a related note, since I don't want to post another question separately, why is it that `import SQLContext.implicits._` throws an error, but `import sqlContext.implicits._` does not? – Ankit Khettry Mar 09 '16 at 09:33
  • because `implicits` defined in *class* sqlcontext, not in object, and in Scala you need to create an instance of a class before using it. Also class sqlcontext require existing sparkContext, so it cannot be defined as an object. – vitalii Mar 09 '16 at 10:40