I have a Spark DataFrame which contains Strings which I am matching to numeric scores, using a Likert scale. Different question Ids map to different scores. I'm trying to pattern match on a range in Scala within an Apache Spark udf, using this question as a guide:
How can I pattern match on a range in Scala?
But I'm getting a compilation error when I use a range rather than a simple OR statement, i.e.
31 | 32 | 33 | 34
works fine
31 to 35
doesn't compile. Any ideas where I'm going wrong on the syntax please?
Also, in the final case _, I'd like to map to a String rather than an Int,
case _ => "None"
but this gives an error:
java.lang.UnsupportedOperationException: Schema for type Any is not supported
Presumably this is an issue which is generic to Spark, as it's perfectly possible to return Any
in native Scala?
Here's my code:
def calculateScore = udf((questionId: Int, answerText: String) => (questionId, answerText) match {
case ((31 | 32 | 33 | 34 | 35), "Rarely /<br>Never") => 4 //this is fine
case ((31 | 32 | 33 | 34 | 35), "Occasionally") => 3
case ((31 | 32 | 33 | 34 | 35), "Often") => 2
case ((31 | 32 | 33 | 34 | 35), "Almost always /<br>Always") => 1
case ((x if 41 until 55 contains x), "None of the time") => 1 //this line won't compile
case _ => 0 //would like to map to "None"
})
The udf then gets used on a Spark DataFrame, as follows:
val df3 = df.withColumn("NumericScore", calculateScore(df("QuestionId"), df("AnswerText")))