0

I have a schema which has nested fields.When I try to convert it with:

jtopy=json.dumps(schema_message['SchemaDefinition']) #json.dumps take a dictionary as input and returns a string as output.
        print(jtopy)
        dict_json=json.loads(jtopy) # json.loads take a string as input and returns a dictionary as output.
        print(dict_json)
        new_schema = StructType.fromJson(dict_json)
        print(new_schema)

It returns error: return StructType([StructField.fromJson(f) for f in json["fields"]]) TypeError: string indices must be integers

The schema is Definition as described below is what Im passing

{
    "type": "record",
    "name": "tags",
    "namespace": "com.tigertext.data.events.tags",
    "doc": "Schema for tags association to accounts (role,etc..)",
    "fields": [
        {
            "name": "header",
            "type": {
                "type": "record",
                "name": "eventHeader",
                "namespace": "com.tigertext.data.events",
                "doc": "Metadata about the event record.",
                "fields": [
                    {
                        "name": "topic",
                        "type": "string",
                        "doc": "The topic this record belongs to. e.g. messages"
                    },
                    {
                        "name": "server",
                        "type": "string",
                        "doc": "The server that generated this event. e.g. xmpp-07"
                    },
                    {
                        "name": "service",
                        "type": "string",
                        "doc": "The service that generated this event. e.g. erlang-producer"
                    },
                    {
                        "name": "environment",
                        "type": "string",
                        "doc": "The environment this record belongs to. e.g. dev, prod"
                    },
                    {
                        "name": "time",
                        "type": "long",
                        "doc": "The time in epoch this record was produced."
                    }
                ]
            }
        },
        {
            "name": "eventType",
            "type": {
                "type": "enum",
                "name": "eventType",
                "symbols": [
                    "CREATE",
                    "UPDATE",
                    "DELETE",
                    "INIT"
                ]
            },
            "doc": "event type"
        },
        {
            "name": "tagId",
            "type": "string",
            "doc": "Tag ID for the tag"
        },
        {
            "name": "orgToken",
            "type": "string",
            "doc": "org ID"
        },
        {
            "name": "tagName",
            "type": "string",
            "doc": "name of the tag"
        },
        {
            "name": "colorId",
            "type": "string",
            "doc": "color id"
        },
        {
            "name": "colorName",
            "type": "string",
            "doc": "color name"
        },
        {
            "name": "colorValue",
            "type": "string",
            "doc": "color value e.g. #C8C8C8"
        },
        {
            "name": "entities",
            "type": [
                "null",
                {
                    "type": "array",
                    "items": {
                        "type": "record",
                        "name": "entity",
                        "fields": [
                            {
                                "name": "entityToken",
                                "type": "string"
                            },
                            {
                                "name": "entityType",
                                "type": "string"
                            }
                        ]
                    }
                }
            ],
            "default": null
        }
    ]
}

Above is the schema of the kafka topic I want to parse into pyspark schema

Benny Elgazar
  • 243
  • 2
  • 9
  • This doesn't relate to spark. its python error tells you that you refer a list like a dict. Share the schema definition.s – Benny Elgazar Mar 10 '22 at 21:11
  • Actually this error is from inside of the `fromJson` function, so I guess `dict_json` is not the shape what `fromJson` expects. @user3082928 if you can add the output of `dict_json`, it would help. – Emma Mar 10 '22 at 21:15
  • I have added the schema definition of the data. – user3082928 Mar 10 '22 at 22:00
  • 1
    This looks like avro schema, it is not directly compatible with pyspark schema. Try looking for how you can read avro into `StructType`. I don't work with avro so I am not sure but maybe this help? https://stackoverflow.com/questions/40789153/how-to-convert-avro-schema-object-into-structtype-in-spark – Emma Mar 10 '22 at 22:16

0 Answers0