0

I have a JSON file in the below format that I am trying to read/query from spark job.

     {
        "dimensions": [

        {"id1":"val1"},
        {"id2":"val2"},
        {"id3":"val"}
         ...        
      ]     
    }


val schema = (new StructType).
                  add("dimensions",
                      new ArrayType(MapType(StringType,StringType),true))

val df = sparkContext.read.schema(schema).json(file)

'dimensions' is a JSON array and contains key value pair. I want to read it as just key-value pair (map) so that its easy to query in Spark SQL.

I tried above above schema, but it gives me array with each item being of MapType. Is there a way to read the above json array as MapType?

In the end, I want to be able to write spark sql something like below where I can select individual keys:

val result = spark.sql("SELECT dimensions.id1, dimensions.id3 FROM table")

Thanks!

vijay
  • 1,203
  • 1
  • 13
  • 25
  • can you explain more its unclear – Akash Sethi Apr 29 '17 at 12:00
  • added more explanation in the original question. Summary is - how can i convert array (each element in this array is a key-value pair) to a dictionary? – vijay Apr 29 '17 at 16:29
  • You can use the `explode()` function to convert the array of maps into rows of maps. It's not fully what you want, but your select query would work (returning 3 rows instead of 1). Another option may be to make the individual key/pairs into columns like in this answer: http://stackoverflow.com/questions/43347098/pyspark-cast-array-with-nested-struct-to-string – Garren S May 01 '17 at 02:00
  • Ended up writing a UDF to convert array of map to map. – vijay May 06 '17 at 08:08

0 Answers0