org.apache.spark.sql.AnalysisException: cannot resolve :While reading data from nested json

Question

can anyone please guide me how should i access amt1,amt2,total from this json schema. after loading json file when i am trying to select data using

     df.select($"b2b.bill.amt1").

I am getting below error message.

     org.apache.spark.sql.AnalysisException: cannot resolve '`b2b`.`bill`['amt1']' due to data type 
     mismatch: argument 2 requires integral type, however, ''amt1'' is of string type.;;

    Json Schema:

    |-- b2b: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- transid: string (nullable = true)
    |    |    |-- bill: array (nullable = true)
    |    |    |    |-- element: struct (containsNull = true)
    |    |    |    |    |-- amt1: double (nullable = true)
    |    |    |    |    |-- amt2: string (nullable = true)
    |    |    |    |    |-- total: string (nullable = true)

Does this answer your question? [SPARK: How to parse a Array of JSON object using Spark](https://stackoverflow.com/questions/57970480/spark-how-to-parse-a-array-of-json-object-using-spark) — mazaneicha, Jul 10 '20 at 17:14

Srinivas · Accepted Answer · 2020-07-10T17:18:44.847

Reason is amt1 is an property of an object which is inside of two array types i.e b2b and bill. you need to explode twice to access amt1 field.

Check below code.

scala> adf.printSchema
root
 |-- b2b: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- bill: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- amt1: double (nullable = true)
 |    |    |    |    |-- amt2: string (nullable = true)
 |    |    |    |    |-- total: string (nullable = true)
 |    |    |-- transid: string (nullable = true)

scala> adf.select(explode($"b2b.bill").as("bill")).withColumn("bill",explode($"bill")).select("bill.*").show(false)
+----+----+-----+
|amt1|amt2|total|
+----+----+-----+
|10.0|20  |ttl  |
+----+----+-----+

Another way .. but its only give first value from the array.

scala> adf.select(explode($"b2b.bill"(0)("amt1")).as("amt1")).show(false)
+----+
|amt1|
+----+
|10.0|
+----+

scala> adf.selectExpr("explode(b2b.bill[0].amt1) as amt1").show(false)
+----+
|amt1|
+----+
|10.0|
+----+

org.apache.spark.sql.AnalysisException: cannot resolve :While reading data from nested json

1 Answers1