52

Does anyone knows how to create Avro schema which contains list of objects of some class?

I want my generated classes to look like below :

class Child {
    String name;
}

class Parent {
    list<Child> children;
}

For this, I have written part of schema file but do not know how to tell Avro to create list of objects of type Children?

My schema file looks like below :

{
    "name": "Parent",
    "type":"record",
    "fields":[
        {
            "name":"children",
            "type":{
                "name":"Child",
                "type":"record",
                "fields":[
                    {"name":"name", "type":"string"}
                ]
            }
        }
    ] 
}

Now problem is that I can mark field children as either Child type or array but do not know how to mark it as a array of objects of type Child class?

Can anyone please help?

Shekhar
  • 11,438
  • 36
  • 130
  • 186

3 Answers3

99

You need to use array type for creating the list. Following is the updated schema that handles your usecase.

{
    "name": "Parent",
    "type":"record",
    "fields":[
        {
            "name":"children",
            "type":{
                "type": "array",  
                "items":{
                    "name":"Child",
                    "type":"record",
                    "fields":[
                        {"name":"name", "type":"string"}
                    ]
                }
            }
        }
    ] 
}
qwetty
  • 1,238
  • 2
  • 10
  • 24
Kapil Balagi
  • 1,074
  • 8
  • 3
  • You might want to consider using List instead since I've encountered NPE's when deserializing String arrays. – Ravindranath Akila Oct 27 '14 at 03:04
  • @qwetty Does the data need to be nested like this? What if we had to have multiple fields of the same type? The definition of the child type would need to be repeated? Seems cleaner to separate the types – emirhosseini Jan 10 '19 at 18:55
  • @Kapil what will be default value for this? – Ashish Mittal Apr 11 '20 at 10:53
  • 1
    The Avro-to-Hive documentation lists the avro type as "list" instead of "array". https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-AvrotoHivetypeconversion Is it possible that the Hive documentation is incorrect ? – Susheel Javadi Apr 28 '20 at 03:54
10

I had following classes and Avro maven plugin generated two classes accordingly :

public class Employees{
    String accountNumber;
    String address;
    List<Account> accountList;    
}

public class Account {
    String accountNumber;
    String id;
}

Avro file format :

{
    "type": "record",
    "namespace": "com.mypackage",
    "name": "AccountEvent",
    "fields": [
        {
            "name": "accountNumber",
            "type": "string"
        },
        {
            "name": "address",
            "type": "string"
        },
        {
            "name": "accountList",
            "type": {
                "type": "array",
                "items":{
                    "name": "Account",
                    "type": "record",
                    "fields":[
                        {   "name": "accountNumber",
                            "type": "string"
                        },
                        {   "name": "id",
                            "type": "string"
                        }
                    ]
                }
            }
        }
    ]
}
Smart Coder
  • 1,435
  • 19
  • 19
  • I'm quite dumb / literal in my readings so I have to ask maybe obvious questions: In the Avro file format, does `"name": "AccountEvent"` correspond to the `Employees` class? In other words, would it be more accurate to say `"name": "Employees"`? – youngrrrr Jul 18 '23 at 03:57
2

Array as type

{
    "type": "record",
    "name": "jamesMedice",
    "fields": [{
        "name": "columns",
        "type": {
            "type": "array",
            "items": {
                "type": "record",
                "name": "columnValues",
                "fields": [{
                        "name": "personId",
                        "type": "string",
                        "default": "null"
                    },
                    {
                        "name": "email",
                        "type": "string",
                        "default": "null"
                    }
                ]
            }
        }
    }]
}
Tiago Medici
  • 1,944
  • 22
  • 22