2

I have a Scala App that has a trait that implements some function(s) and a class which extends that trait.

The class mentioned above also has a function which calls the function that is defined in the parent trait using it's parameter.

I observed this in Spark + Kafka implementation using Scala. I'm guessing this is some kind of design pattern but I don't know which one. Is it Cake Pattern? Dependency Injection Pattern? Or something else?

Below is the code I'm referring to:

trait SparkApplication {
  def sparkConfig: Map[String, String]
  def withSparkContext(f: SparkContext => Unit): Unit = {
    val conf = new SparkConf()
    sparkConfig.foreach { case (k, v) => conf.setIfMissing(k, v) }
    val sc = new SparkContext(conf)
    f(sc)
  }
}

trait SparkStreamingApplication extends SparkApplication {
  def withSparkStreamingContext(f: (SparkContext, StreamingContext) => Unit): Unit = {
    withSparkContext { sc =>
      val ssc = new StreamingContext(sc, Seconds(streamingBatchDuration.toSeconds))
      ssc.checkpoint(streamingCheckpointDir)
      f(sc, ssc)
      ssc.start()
      ssc.awaitTermination()
    }
  }
}
philantrovert
  • 9,904
  • 3
  • 37
  • 61
Felipe
  • 7,013
  • 8
  • 44
  • 102
  • Inheritance pattern? – Jasper-M Jan 09 '18 at 13:32
  • and.... why use a higher order function on the parent trait? and... why override the function inside another function? – Felipe Jan 09 '18 at 13:45
  • @FelipeOliveiraGutierrez it's not overriding function , its making a call with lambda argument, you can rewrite it as like this , - `withSparkContext (sc => { ... }) ` this will work same – Ömer Erden Jan 09 '18 at 14:52
  • you are right. I have to get used with this lambda... thanks – Felipe Jan 09 '18 at 15:07
  • you're welcome, check this [topic](https://stackoverflow.com/questions/4386127/what-is-the-formal-difference-in-scala-between-braces-and-parentheses-and-when) to see how it works – Ömer Erden Jan 09 '18 at 15:11

1 Answers1

3

What is being used here (albeit, with a possible error) is the so-called Loan Pattern, called in such way because it's useful when you want to manage the lifecycle of a resource (in your case a SparkContext), while allowing the user to define how the resource is going to be used.

A classical example of this is files: you want to open a file, read it's content and then close it as soon as you are done, without allowing the user to make some mistake and forget to close the resource. You may implement this as follows:

import scala.io.Source

// Read a file at `path` and allow to pass a function that iterates over lines
def consume[A](path: String)(f: Iterator[String] => A): A = {
  val source = Source.fromFile(path)
  try {
    f(source.getLines)
  } finally {
    source.close()
  }
}

Then you'd use this as follows (in the example, to just print all the lines paired with their numbers):

consume("/path/to/some/file")(_.zipWithIndex.foreach(println))

As you may notice, there is something very close to this going on in your code, with the only difference that the resource whose lifecycle you are managing is a SparkContext.

Regarding the possible error I mentioned initially, it regards the fact that you are loaning a SparkContext that you never close. That is probably ok, but the main aspect of the Loan Pattern is precisely that of minimizing the error surface when it comes to managing resources. You may be interested in doing something like the following (you want to check the last line in the method):

def withSparkContext(f: SparkContext => Unit): Unit = {
  val conf = new SparkConf()
  sparkConfig.foreach { case (k, v) => conf.setIfMissing(k, v) }
  val sc = new SparkContext(conf)
  f(sc)
  sc.stop() // shutdown the context after the user is done
}

You may read more regarding this pattern here.

As a side note, you may be interested in this project that creates a very nice and idiomatic interface around managed resources.

stefanobaghino
  • 11,253
  • 4
  • 35
  • 63