39

I can do,

{
    "type": "record",
    "name": "Foo",
    "fields": [
        {"name": "bar", "type": {
            "type": "record",
            "name": "Bar",
            "fields": [ ]
        }}
    ]
}

and that works fine, but supposing I want to split the schema up into two files such as:

{
    "type": "record",
    "name": "Foo",
    "fields": [
        {"name": "bar", "type": "Bar"}
    ]
}

{
    "type": "record",
    "name": "Bar",
    "fields": [ ]
}

Does Avro have the capability to do this?

Owen
  • 38,836
  • 14
  • 95
  • 125

6 Answers6

47

Yes, it's possible.

I've done that in my java project by defining common schema files in avro-maven-plugin Example:

search_result.avro:

{
    "namespace": "com.myorg.other",
    "type": "record",
    "name": "SearchResult",
    "fields": [
        {"name": "type", "type": "SearchResultType"},
        {"name": "keyWord",  "type": "string"},
        {"name": "searchEngine", "type": "string"},
        {"name": "position", "type": "int"},
        {"name": "userAction", "type": "UserAction"}
    ]
}

search_suggest.avro:

{
    "namespace": "com.myorg.other",
    "type": "record",
    "name": "SearchSuggest",
    "fields": [
        {"name": "suggest", "type": "string"},
        {"name": "request",  "type": "string"},
        {"name": "searchEngine", "type": "string"},
        {"name": "position", "type": "int"},
        {"name": "userAction", "type": "UserAction"},
        {"name": "timestamp", "type": "long"}
    ]
}

user_action.avro:

{
    "namespace": "com.myorg.other",
    "type": "enum",
    "name": "UserAction",
    "symbols": ["S", "V", "C"]
}

search_result_type.avro

{
    "namespace": "com.myorg.other",
    "type": "enum",
    "name": "SearchResultType",
    "symbols": ["O", "S", "A"]
}

avro-maven-plugin configuration:

<plugin>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>1.7.4</version>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>schema</goal>
            </goals>
            <configuration>
                <sourceDirectory>${project.basedir}/src/main/resources/avro</sourceDirectory>
                <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
                <includes>
                    <include>**/*.avro</include>
                </includes>
                <imports>
                    <import>${project.basedir}/src/main/resources/avro/user_action.avro</import>
                    <import>${project.basedir}/src/main/resources/avro/search_result_type.avro</import>
                </imports>
            </configuration>
        </execution>
    </executions>
</plugin>
Diego Magdaleno
  • 831
  • 8
  • 20
AlexTiunov
  • 471
  • 3
  • 4
  • why enum types need to be imported explicitly? – Rites Jun 29 '17 at 06:54
  • I don't think its about the fact it is a enum. I think the avro-maven-plugin needs to know that there are some dependent types. Looking a the ```plungin.xml``` for maven, it indicates the following for the imports tag: ```A list of files or directories that should be compiled first thus making them importable by subsequently compiled schemas. Note that imported files should not reference each other.``` – The Code Pimp Jun 21 '18 at 15:08
  • 1
    Setting ${project.basedir}/src/main/resources/avro/*.avro frees me from manually specifying each schema to be imported. – rsn86 Jul 17 '18 at 19:58
  • @rsn86 It wouldn't work like that. The schema definitions you plan to reuse need to be imported. Otherwise you'll end up in an error `Execution schemas of goal org.apache.avro:avro-maven-plugin:1.8.2:schema fai led: Undefined name: "DisplayAvro"`... – Romeo Sierra Aug 05 '21 at 04:35
27

You can also define multiple schemas inside of one file:

schemas.avsc:

[
{
    "type": "record",
    "name": "Bar",
    "fields": [ ]
},
{
    "type": "record",
    "name": "Foo",
    "fields": [
        {"name": "bar", "type": "Bar"}
    ]
}
]

If you want to reuse the schemas in multiple places this is not super nice but it improves readability and maintainability a lot in my opinion.

Michael
  • 848
  • 8
  • 13
  • 1
    This is much better if you are using Python as it makes life a lot simpler. – JARC Feb 09 '17 at 14:01
  • 11
    In your example you've actually defined a single `union` type that has two different member types. This could have implications depending on which kind of system is processing the schema. – teabot Jun 10 '19 at 18:15
5

I assume, your motivation is (as my own) structuring your schema definition and avoiding copy&paste-errors.

To achieve that, you can also use Avro IDL. It allows to define avro schemas on a higher level. Reusing types is possible within the same file and also across multiple files.

To generate the .avsc-files run

$ java -jar avro-tools-1.7.7.jar idl2schemata my-protocol.avdl

The resulting .avsc-files will look pretty much the same as your initial example, but as they are generated from the .avdl you'll not get lost in the verbose json-format.

Fabian Braun
  • 3,612
  • 1
  • 27
  • 44
3

The order of imports in the pom.xml matters. You must import the subtypes first before processing the rest.

<imports>
    <import>${project.basedir}/src/main/resources/avro/Bar.avro</import>
    <import>${project.basedir}/src/main/resources/avro/Foo.avro</import>
</imports>

That would unblock the codegen from emitting undefined name: Bar.avro error.

Jack Yeh
  • 31
  • 2
0

From what I have been able to figure out so far, no.

There is a good write up about someone who coded their own method for doing this here:

http://www.infoq.com/articles/ApacheAvro

Cargo23
  • 3,064
  • 16
  • 25
0

You need to import the avsc file in avro-maven plugin where you have first written the object schema that you want to reuse

<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>${avro.maven.plugin.version}</version>
<configuration>
    <stringType>String</stringType>
</configuration>
<executions>
    <execution>
        <phase>generate-sources</phase>
        <goals>
            <goal>schema</goal>
        </goals>
        <configuration>
            <sourceDirectory>src/main/java/com/xyz/avro</sourceDirectory> // Avro directory
            <imports>
                <import>src/main/java/com/xyz/avro/file.avsc</import> // Import here
            </imports>
        </configuration>
    </execution>
</executions>

blueberry
  • 325
  • 2
  • 8