Why is spark.implicits._ is embedded just before converting any rdd to ds and not as regular imports?

Question

I am learing spark datasets and checking how can we convert an rdd to a dataset.

For this, i got the following code:

val spark = SparkSession
      .builder
      .appName("SparkSQL")
      .master("local[*]")
      .getOrCreate()

    val lines = spark.sparkContext.textFile("../myfile.csv")
    val structuredData = lines.map(mapperToConvertToStructureData)

    import spark.implicits._
    val someDataset = structuredData.toDS

Here if we want to convert an rdd to dataset, we need import spark.implicits._ just before the conversion.

Why is this written just before the conversion? Can we use this import as regular imports as we do on the top of the file?

see also https://stackoverflow.com/questions/50984326/import-implicit-conversions-without-instance-of-sparksession — Raphael Roth, Mar 16 '19 at 16:48

ollik1 · Answer 1 · 2019-03-16T17:05:25.877

4

Here spark in an instance of class org.apache.spark.sql.SparkSession so the instance must exist before importing from it.

edited Mar 16 '19 at 17:05

answered Mar 16 '19 at 14:41

ollik1

4,460
1
9
20

3

no, it's an instance of `org.apache.spark.sql.SparkSession` – Raphael Roth Mar 16 '19 at 16:47
Thanks for pointing that out @RaphaelRoth, fixed the answer. It seems both `SparkSession ` and `SQLContext` can provide these implicits – ollik1 Mar 16 '19 at 17:07

score 0 · Answer 2 · answered Mar 16 '19 at 15:37

Spark implicits are required to work with datasets because it's the place that all the implicit functions and classes that are needed for the Encoders are found. Encoders are needed for all transformations to datasets. Take a look at the documentation and you will see in all dataset transformation, you have a "A : Encoder" bound or an Encoder implicit.

In scala normally this implicits are in {objects} but in spark they are inside the sparkSession class, so until you don't have an instance, you cant import them.

Why is spark.implicits._ is embedded just before converting any rdd to ds and not as regular imports?

2 Answers2

Linked