43

I'm trying to get Python to parse Avro schemas such as the following...

from avro import schema

mySchema = """
{
    "name": "person",
    "type": "record",
    "fields": [
        {"name": "firstname", "type": "string"},
        {"name": "lastname", "type": "string"},
        {
            "name": "address",
            "type": "record",
            "fields": [
                {"name": "streetaddress", "type": "string"},
                {"name": "city", "type": "string"}
            ]
        }
    ]
}"""

parsedSchema = schema.parse(mySchema)

...and I get the following exception:

avro.schema.SchemaParseException: Type property "record" not a valid Avro schema: Could not make an Avro Schema object from record.

What am I doing wrong?

Jorge Aranda
  • 2,050
  • 2
  • 20
  • 29

2 Answers2

61

According to other sources on the web I would rewrite your second address definition:

mySchema = """
{
    "name": "person",
    "type": "record",
    "fields": [
        {"name": "firstname", "type": "string"},
        {"name": "lastname", "type": "string"},
        {
            "name": "address",
            "type": {
                        "type" : "record",
                        "name" : "AddressUSRecord",
                        "fields" : [
                            {"name": "streetaddress", "type": "string"},
                            {"name": "city", "type": "string"}
                        ]
                    }
        }
    ]
}"""
Mauricio Scheffer
  • 98,863
  • 23
  • 192
  • 275
Marco de Wit
  • 2,686
  • 18
  • 22
  • 2
    Thanks, Marco, that worked. The second declaration of the address name (the one where you wrote "AddressUSRecord") seems to be necessary to parse the schema, but ignored when working with data that adheres to the schema. – Jorge Aranda Aug 01 '12 at 19:06
  • 2
    This makes little sense. Why can `person` have a `type` of `record`, but `address` cannot? – Tianxiang Xiong Nov 28 '16 at 18:44
  • Where in the avro spec does it allow a `type` to be expanded like this? – user239558 Feb 13 '19 at 07:30
  • Check out the Parsing Canonical Form part of the spec.: https://avro.apache.org/docs/current/spec.html#Parsing+Canonical+Form+for+Schemas As far as I understand it, ALL types are expended, even primitives and the single word we usually see is the Parsed Canonical Form of the schema. so when we write: {"type": "string"} its the same as writing, {"type": {"type": "string"}} – Eurospoofer Oct 30 '19 at 17:49
  • 1
    This answer would have saved my 1-day worth of debugging if I found it earlier. – Susheel Javadi May 01 '20 at 07:33
7

Every time we provide the type as named type, the field needs to be given as:

"name":"some_name",
"type": {
          "name":"CodeClassName",
           "type":"record/enum/array"
 } 

However, if the named type is union, then we do not need an extra type field and should be usable as:

"name":"some_name",
"type": [{
          "name":"CodeClassName1",
           "type":"record",
           "fields": ...
          },
          {
           "name":"CodeClassName2",
            "type":"record",
            "fields": ...
}]

Hope this clarifies further!

Sampada
  • 2,931
  • 7
  • 27
  • 39
Ani
  • 463
  • 4
  • 20