1

can anyone please guide me how should i access amt1,amt2,total from this json schema. after loading json file when i am trying to select data using

     df.select($"b2b.bill.amt1").

I am getting below error message.

     org.apache.spark.sql.AnalysisException: cannot resolve '`b2b`.`bill`['amt1']' due to data type 
     mismatch: argument 2 requires integral type, however, ''amt1'' is of string type.;;

    Json Schema:

    |-- b2b: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- transid: string (nullable = true)
    |    |    |-- bill: array (nullable = true)
    |    |    |    |-- element: struct (containsNull = true)
    |    |    |    |    |-- amt1: double (nullable = true)
    |    |    |    |    |-- amt2: string (nullable = true)
    |    |    |    |    |-- total: string (nullable = true)
Rahul Patidar
  • 189
  • 1
  • 1
  • 14

1 Answers1

0

Reason is amt1 is an property of an object which is inside of two array types i.e b2b and bill. you need to explode twice to access amt1 field.

Check below code.

scala> adf.printSchema
root
 |-- b2b: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- bill: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- amt1: double (nullable = true)
 |    |    |    |    |-- amt2: string (nullable = true)
 |    |    |    |    |-- total: string (nullable = true)
 |    |    |-- transid: string (nullable = true)


scala> adf.select(explode($"b2b.bill").as("bill")).withColumn("bill",explode($"bill")).select("bill.*").show(false)
+----+----+-----+
|amt1|amt2|total|
+----+----+-----+
|10.0|20  |ttl  |
+----+----+-----+

Another way .. but its only give first value from the array.

scala> adf.select(explode($"b2b.bill"(0)("amt1")).as("amt1")).show(false)
+----+
|amt1|
+----+
|10.0|
+----+
scala> adf.selectExpr("explode(b2b.bill[0].amt1) as amt1").show(false)
+----+
|amt1|
+----+
|10.0|
+----+
Srinivas
  • 8,957
  • 2
  • 12
  • 26