Spark convert RDD to DataFrame - Enumeration is not supported

Question

I have a case class which contains a enumeration field "PersonType". I would like to insert this record to a Hive table.

object PersonType extends Enumeration {
  type PersonType = Value
  val BOSS = Value
  val REGULAR = Value
}

case class Person(firstname: String, lastname: String)
case class Holder(personType: PersonType.Value, person: Person)

And:

val hiveContext = new HiveContext(sc)
import hiveContext.implicits._

val item = new Holder(PersonType.REGULAR, new Person("tom", "smith"))
val content: Seq[Holder] = Seq(item)

val data : RDD[Holder] = sc.parallelize(content)
val df = data.toDF()

... When I try to convert the corresponding RDD to DataFrame, I get the following exception:

Exception in thread "main" java.lang.UnsupportedOperationException:     
Schema for type com.test.PersonType.Value is not supported   
        ...
        at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:691)
        at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
        at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:630)
        at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
        at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:414)
        at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:94)

I'd like to convert PersonType to String before inserting to Hive. Is it possible to extend the implicitconversion to handle PersonType as well? I tried something like this but didn't work:

 object PersonTypeConversions {
    implicit def toString(personType: PersonTypeConversions.Value): String = personType.toString()
 }
import PersonTypeConversions._

Spark: 1.6.0

Possible duplicate of [How to define schema for custom type in Spark SQL?](http://stackoverflow.com/questions/32440461/how-to-define-schema-for-custom-type-in-spark-sql) — zero323, Jun 24 '16 at 18:51
Thanks @zero323 I already checked your answer in this question but wondering whether there's another option as I try to avoid internal APIs that may change in the future (or already changed in 2.0.) — Bruckwald, Jun 24 '16 at 18:55
As far as I am aware nothing that will give you a human readable output. :/ And this is already private in 2.0.0-preview — zero323, Jun 24 '16 at 19:04
Thanks. I'll then try get rid of the enumeration in my case class I think this is the easiest solution as far as I see — Bruckwald, Jun 25 '16 at 08:26

Spark convert RDD to DataFrame - Enumeration is not supported

0 Answers0