1

I have a Dataset[(Long, String)] that contains an id and a json String It's built more or less like this:

val ids: Dataset[Long] = ...
val results = ids.mapPartitions( ids => {
   // Create http client
   .
   .
   ids.map( id => (id, getJsonById(id))
   }

If I run results.toDF it will create a dataframe with the id and a string with the json, but what I want to have is a Dataframe with the id and all columns that are in the json.

How can I achieve that?

Edit: I want to load the whole json as dataframe, not a particular field of it. Something like what sparkContext.read.json(jsonRDD: RDD[String]) would do.

Thanks

Nexaspx
  • 371
  • 4
  • 20
Javier S
  • 73
  • 8
  • Something like http://stackoverflow.com/questions/39238367/how-to-extract-values-from-json-string-in-spark? – Josemy Mar 30 '17 at 15:22
  • If I'm not wrong, with that I can create a new column with a value inside the json, but in my case I want to have the whole json structure in the dataframe – Javier S Mar 30 '17 at 15:58

0 Answers0