1

I have a DataFrame with various types in the columns. For clarity's sake, let's say it is structured like below, with a column of Ints, a column of Strings, and a column of Floats.

+-------+-------+-------+
|column1|column2|column3|
+-------+-------+-------+
|      1|      a|    0.1|
|      2|      b|    0.2|
|      3|      c|    0.3|
+-------+-------+-------+

I was attempting to apply a UDF to all three of these columns to change each entry to a case class like that below:

case class Annotation(lastUpdate: String, value: Any)

By applying the below code:

val columns = df.columns
val myUDF= udf { in: Any => Annotation("dummy", in) }
val finalDF = columns.foldLeft(df){ (tempDF, colName) =>
    tempDF.withColumn(colName, myUDF(col(colName)))
}

Note that in this first pass, I don't care what the Annotation.lastUpdate value is. However, when attempting to run this,I receive the following error:

Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type scala.Any is not supported
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:762)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:704)
    at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:809)
    at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:703)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$6.apply(ScalaReflection.scala:758)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$6.apply(ScalaReflection.scala:757)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.immutable.List.foreach(List.scala:381)

I have been looking into custom encoders for resolving this issue, but am unsure of how to apply one in this scenario.

zero323
  • 322,348
  • 103
  • 959
  • 935
mongolol
  • 941
  • 1
  • 13
  • 31

0 Answers0