I have a schema which has nested fields.When I try to convert it with:
jtopy=json.dumps(schema_message['SchemaDefinition']) #json.dumps take a dictionary as input and returns a string as output.
print(jtopy)
dict_json=json.loads(jtopy) # json.loads take a string as input and returns a dictionary as output.
print(dict_json)
new_schema = StructType.fromJson(dict_json)
print(new_schema)
It returns error: return StructType([StructField.fromJson(f) for f in json["fields"]]) TypeError: string indices must be integers
The schema is Definition as described below is what Im passing
{
"type": "record",
"name": "tags",
"namespace": "com.tigertext.data.events.tags",
"doc": "Schema for tags association to accounts (role,etc..)",
"fields": [
{
"name": "header",
"type": {
"type": "record",
"name": "eventHeader",
"namespace": "com.tigertext.data.events",
"doc": "Metadata about the event record.",
"fields": [
{
"name": "topic",
"type": "string",
"doc": "The topic this record belongs to. e.g. messages"
},
{
"name": "server",
"type": "string",
"doc": "The server that generated this event. e.g. xmpp-07"
},
{
"name": "service",
"type": "string",
"doc": "The service that generated this event. e.g. erlang-producer"
},
{
"name": "environment",
"type": "string",
"doc": "The environment this record belongs to. e.g. dev, prod"
},
{
"name": "time",
"type": "long",
"doc": "The time in epoch this record was produced."
}
]
}
},
{
"name": "eventType",
"type": {
"type": "enum",
"name": "eventType",
"symbols": [
"CREATE",
"UPDATE",
"DELETE",
"INIT"
]
},
"doc": "event type"
},
{
"name": "tagId",
"type": "string",
"doc": "Tag ID for the tag"
},
{
"name": "orgToken",
"type": "string",
"doc": "org ID"
},
{
"name": "tagName",
"type": "string",
"doc": "name of the tag"
},
{
"name": "colorId",
"type": "string",
"doc": "color id"
},
{
"name": "colorName",
"type": "string",
"doc": "color name"
},
{
"name": "colorValue",
"type": "string",
"doc": "color value e.g. #C8C8C8"
},
{
"name": "entities",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
"name": "entity",
"fields": [
{
"name": "entityToken",
"type": "string"
},
{
"name": "entityType",
"type": "string"
}
]
}
}
],
"default": null
}
]
}
Above is the schema of the kafka topic I want to parse into pyspark schema