I have some JSON Data which I'd like to store in parsed form (meaning a Map[String,Any]) within a Spark Dataframe.
Is there any way to do this? I think it involves an Encoder, but I'm not sure where to start.
I have some JSON Data which I'd like to store in parsed form (meaning a Map[String,Any]) within a Spark Dataframe.
Is there any way to do this? I think it involves an Encoder, but I'm not sure where to start.
Not in an useful way. You can use strongly typed Dataset
with Kryo encoder
import org.apache.spark.sql.{Encoder, Encoders}
implicit val mapStrAnyEnc: Encoder[Map[String, Any]] = Encoders.kryo
Seq(Map("foo" -> 1.0, "bar" -> "foo")).toDS.show
// +--------------------+
// | value|
// +--------------------+
// |[35 01 02 40 01 0...|
// +--------------------+
but value of representation like this is close to none if you want to use DataFrames
.
A natural mapping for heterogeneous object is struct
, but if number of fields is large or unbounded, your best options is to go with Map[String, String]
and parse values only when needed.