Using spark 1.6.0 Say i have a class like this
case class MyClass(date: java.util.Date, oid: org.bson.types.ObjectId)
if i have
//rdd: RDD[MyClass]
rdd.toDF("date", "oid")
i get java.lang.UnsupportedOperationException: Schema for type java.util.Date/org.bson.types.ObjectId is not supported
now i know i can make it a java.sql.Date
but let's say MyClass
is depended upon in too many other places to make that change everywhere, that still won't solve the ObjectId
problem.
i am also aware of the UserDefinedType
option. But it seems like that only works if you also create a new class to work with it (and again, signature of MyClass
needs to stay the same)
is there not a way to just register a serializer/deserializer for java.util.Date
and org.bson.types.ObjectId
so that i can call toDF
on the RDD[MyClass]
and it will just work?
UPDATE
so this doesn't exactly answer my question, but it unblocked us, so will share here in the hope that it's helpful for someone else. so most of the json libraries do support this use case, and spark-sql has a built-in sqlContext.read.json(stringRdd).write.parquet("/path/to/output")
. so you can just define the (de)ser for the class using your json lib of choice, serialize to string, then spark-sql can handle the rest