0

I have some JSON Data which I'd like to store in parsed form (meaning a Map[String,Any]) within a Spark Dataframe.

Is there any way to do this? I think it involves an Encoder, but I'm not sure where to start.

rongenre
  • 1,334
  • 11
  • 21

1 Answers1

0

Not in an useful way. You can use strongly typed Dataset with Kryo encoder

import org.apache.spark.sql.{Encoder, Encoders}

implicit val mapStrAnyEnc: Encoder[Map[String, Any]] = Encoders.kryo
Seq(Map("foo" -> 1.0, "bar" -> "foo")).toDS.show

// +--------------------+
// |               value|
// +--------------------+
// |[35 01 02 40 01 0...|
// +--------------------+

but value of representation like this is close to none if you want to use DataFrames.

A natural mapping for heterogeneous object is struct, but if number of fields is large or unbounded, your best options is to go with Map[String, String] and parse values only when needed.

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
  • Actually I'd like to have the value field be a payload which I can process on the scala side: So it's more like `Seq((1, Map("foo" -> 1.0, "bar" -> "foo"))).toDS.show` – rongenre Apr 11 '18 at 20:48
  • It is not an issue (see https://stackoverflow.com/q/36648128/8371915) but usability is still rather low. And it of course add on top of handling `Anys`, which is a mess alone. – Alper t. Turker Apr 11 '18 at 20:56
  • Huh, and is there any way to encode a `case class Foo(id: Int, map Map[String,Any])` class? I really just want to look up a record by id and then look at the map in client code. – rongenre Apr 12 '18 at 03:28