I'm trying to get the SQLContext
instance from one module in another module. The first module instantiates it to an implicit sqlContext
and I had (erroneously) thought that I could then use an implicit parameter in the second module, but the compiler informs me that:
could not find implicit value for parameter sqlCtxt: org.apache.spark.sql.SQLContext
Here's the skeletal setup I have (I have elided imports and details):
-----
// Application.scala
-----
package apps
object Application extends App {
val env = new SparkEnvironment("My app", ...)
try {
// Call methods from various packages that use code from internally DFExtensions.scala
}
}
-----
// SparkEnvironment.scala
-----
package common
class SparkEnvironment(val app: String, ...) {
@transient lazy val conf: SparkConf = new SparkConf().setAppName(app)
@transient implicit lazy val sc: SparkContext = new SparkContext(conf)
@transient implicit lazy val sqlContext: SQLContext = new SQLContext(sc)
...
}
-----
// DFExtensions.scala
-----
package util
object DFExtensions {
private def myFun(...)(implicit sqlCtxt: SQLContext) = { ... }
implicit final class DFExt(val df: DataFrame) extends AnyVal {
// Extension methods for DataFrame where myFun is supposed to be used -- causes exception!
}
}
Since it's a multi-project sbt setup I don't want to pass around the instance env
to all related objects because the stuff in util
is really a shared library. Each sub-project (i.e. app) has its own instance created in the main
method.
Because myFun
is only called from the implicit class DFExt
I thought about creating an implicit just before each call à la implicit val sqlCtxt = df.sqlContext
and that compiles but it's kind of ugly and I would not need the implicit in SparkEnvironment
any longer.
According to this discussion the implicit sqlContext
instance is not in scope, hence compilation fails. I'm not sure a package object would work because the implicit value and parameter are in different packages.
Is what I'm trying to achieve even possible? Is there a better alternative?
The idea is to have several sub-projects that use the same libraries and core functions to share the same project. They are typically updated together, so it's nice to have them in a single place. Most of the library functions directly work on data frames and other structures in Spark, but occasionally I need to do something that requires an instance of SparkContext
or SQLContext
, for instance write a query with sqlContext.sql
as some syntax is not yet natively supported (e.g. flattening with outer lateral views).
Each sub-project has its own main method that creates an implicit instance. Obviously the libraries do not 'know' about this as they are in different packages and I don't pass around the instances. I had thought that somehow implicits are looked for at runtime, so that when an application runs there is an instance of SQLContext defined as an implicit. It's possible that a) it's not in scope because it's in a different package or b) what I'm trying to do is just a bad idea.
Currently there is only one main method because I first have to split the application in multiple components, which I have not done yet.
Just in case it helps:
- Spark 1.4.1
- Scala 2.10
- sbt 0.13.8