0

How can I define the schema for a json array so that I can explode it into rows?

I have a UDF which returns a string (json array), I want to explode the item in array into rows and then save it.

sample json

[{"name":"name1","id":"1"}, {"name":"name2","id":"2"}]

I defined by Schema as

// schema of each item in the array
StructType arrayItemDataType= new StructType(new structfiled("name"), new structfiled(id));
var rootType = new ArrayType(itemDataType); // array of items

var field = new StructField(name: "MyJson", dataType: rootType, isNullable: false);
StructType schema = new StructType(new structField(field));


df.select(from_json('mycol', schema))

after explode tt creates schema like

root
 |-- col: struct (nullable = true)
 |    |-- name: string
 |    |-- id: string

I am running on local cluster and writing to csv, I was expecting after explode it should have dataframe with 2 columns name and id and i can write all rows in csv. When I run it is not creating df schema as name,id and fails to write to csv with message "csv doesn't support struct<"

user9297554
  • 347
  • 4
  • 17

1 Answers1

-2

Sample DataFrame:

%python

list=[['[{"name":"name1","id":"1"}, {"name":"name2","id":"2"}]']]
df=spark.createDataFrame(list,['mycol'])
df.show(truncate=False)

#df:pyspark.sql.dataframe.DataFrame = [mycol: string]
#+------------------------------------------------------+
#|mycol                                                 |
#+------------------------------------------------------+
#|[{"name":"name1","id":"1"}, {"name":"name2","id":"2"}]|
#+------------------------------------------------------+

You can select all fields of a struct using the name of column and .*

from pyspark.sql.types import *
from pyspark.sql import functions as F
json_schema=ArrayType(StructType([ StructField("name", StringType()), StructField("id", StringType())]))

df.withColumn("json",F.explode(F.from_json("mycol",json_schema)))\
  .select("json.*").show()

#+-----+---+
#| name| id|
#+-----+---+
#|name1|  1|
#|name2|  2|
#+-----+---+
murtihash
  • 8,030
  • 1
  • 14
  • 26