I'd like to build a structure that links a regex pattern to a description of a feature within some text.
Example: "^.* horses .$" maps to 'horses'; "^. pigs .*$" maps to 'pigs' and so on
There are thousands of possible descriptions for this text, so grouping a compiled regex pattern w/ its description would allow me to search efficiently. Below is the key part of my code:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.{Encoder, Encoders}
import scala.util.matching.Regex
object GlueApp {
case class RegexMetadata(regexName: String, pattern: scala.util.matching.Regex)
def main(sysArgs: Array[String]) {
val args = GlueArgParser.getResolvedOptions(sysArgs, Seq("JOB_NAME").toArray)
val sc: SparkContext = new SparkContext()
val glueContext: GlueContext = new GlueContext(sc)
val spark = glueContext.getSparkSession
import spark.implicits._
Job.init(args("JOB_NAME"), glueContext, args.asJava)
implicit val regexEncoder = Encoders.kryo[scala.util.matching.Regex]
implicit val regexMetadataEncoder = Encoders.product[RegexMetadata]
Job.commit()
}
}
When I run this, I get the following error:
java.lang.UnsupportedOperationException No Encoder found for scala.util.matching.Regex
It compiles and runs fine when I don't have the "implicit val regexMetadataEncoder" line. This seems to work on Databricks, but not on AWS Glue.
Some searching found these similar questions, but I can't solve my problem w/ them:
scala generic encoder for spark case class
Thank you for your help!