1

I have trained a Spark Multilayer Perceptron Classifier to detect spam messages and would like to use it in a webservice in combination with the Play Framework.

My solution (see below) spawns an embedded local spark cluster, loads the model and classifies messages. Is there a way to use the model without an embedded Spark cluster?

Spark has some dependencies that clash with the Play Framework dependencies. I thought there might be a way to run the model in classification mode without starting an embedded spark cluster.

My second question is if I can classify a single message without putting it in a DataFrame first.

Application Loader:

lazy val sparkSession: SparkSession = {
  val conf: SparkConf = new SparkConf()
    .setMaster("local[*]")
    .setAppName("Classifier")
    .set("spark.ui.enabled", "false")

  val session = SparkSession.builder()
    .config(conf)
    .getOrCreate()

  applicationLifecycle.addStopHook { () ⇒
    Future { session.stop() }
  }

  session
}
lazy val model: PipelineModel = {
  sparkSession
  CrossValidatorModel.load("mpc-model").bestModel.asInstanceOf[PipelineModel]
}

Classification service (model and spark session are injected):

val messageDto = Seq(MessageSparkDto(
  sender         = message.sender.email.value,
  text           = featureTransformer.cleanText(text).value,
  messagelength  = text.value.length,
  isMultimail    = featureTransformer.isMultimail(message.sender.email),
))

val messageDf = messageDto.toDS()

model.transform(messageDf).head().getAs[Double]("prediction") match {
  case 1.0 ⇒ MessageEvaluationResult(MessageClass.Spam)
  case _   ⇒ MessageEvaluationResult(MessageClass.NonSpam)
}

Edit: As pointed out in the comments, one solution could be to transform the model to PMML and then use another engine to load the model and use it for classification. This sounds too me like a lot of overhead as well. Has someone experience with running spark in local mode with minimal overhead and dependencies to use the ML classifiers?

MeiSign
  • 1,487
  • 1
  • 15
  • 39

1 Answers1

1

Although I like the solution proposed in the linked post, the following might also be possible. You could of course copy that model to the Server onto which you will deploy the Webservice, install a spark "cluster" with one machine on it and put spark jobserver on top of it, which will handle the requests and access spark. That would be the no-brainer-solution and should work if your model does not need lots of computational power.

Community
  • 1
  • 1
Elmar Macek
  • 380
  • 4
  • 12