I have a DataFrame
with various types in the columns. For clarity's sake, let's say it is structured like below, with a column of Ints
, a column of Strings
, and a column of Floats
.
+-------+-------+-------+
|column1|column2|column3|
+-------+-------+-------+
| 1| a| 0.1|
| 2| b| 0.2|
| 3| c| 0.3|
+-------+-------+-------+
I was attempting to apply a UDF to all three of these columns to change each entry to a case class like that below:
case class Annotation(lastUpdate: String, value: Any)
By applying the below code:
val columns = df.columns
val myUDF= udf { in: Any => Annotation("dummy", in) }
val finalDF = columns.foldLeft(df){ (tempDF, colName) =>
tempDF.withColumn(colName, myUDF(col(colName)))
}
Note that in this first pass, I don't care what the Annotation.lastUpdate
value is. However, when attempting to run this,I receive the following error:
Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type scala.Any is not supported
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:762)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:704)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:809)
at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:703)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$6.apply(ScalaReflection.scala:758)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$6.apply(ScalaReflection.scala:757)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
I have been looking into custom encoders for resolving this issue, but am unsure of how to apply one in this scenario.