34

I wrote one Avro schema in which some of the fields ** need to be ** of type String but Avro has generated those fields of type CharSequence.

I am not able to find any way to tell Avro to make those fields of type String.

I tried to use

"fields": [
    {
        "name":"startTime",
        "type":"string",
        "avro.java.stringImpl":"String"
    },
    {
        "name":"endTime",
        "type":"string",
        "avro.java.string":"String"
    }
]

but for both the fields Avro is generating fields of type CharSequence.

Is there any other way to make those fields of type String?

Shekhar
  • 11,438
  • 36
  • 130
  • 186
  • `String` class implements `CharSequence` interface. –  Aug 04 '14 at 12:29
  • 3
    CharSequence is an interface. By default Avro uses its own Utf8 class as CharSequence implementation. Utf8 is not more than a byte buffer than can be converted into a String using `toString`. Utf8 is convenient when you don't care about the string like in benchmarks... but most often you what to use the CharSequence and you will have to convert it into a String. This is cumbersome and it a 100% memory footprint overhead because the string is now stored both as an Utf8 and as a String. That's why lot of people want String not CharSequence. Mixing both could be useful too. – Clément MATHIEU Aug 14 '14 at 18:52

2 Answers2

70

If you want all you string fields be instances of java.lang.String then you only have to configure the compiler:

java -jar /path/to/avro-tools-1.7.7.jar compile -string schema 

or if you are using the Maven plugin

<plugin>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-maven-plugin</artifactId>
  <version>1.7.7</version>
  <configuration>
    <stringType>String</stringType>
  </configuration>
  [...]
</plugin>        

If you want one specific field to be of type java.lang.String then... you can't. It is not supported by the compiler. You can use "java-class" with the reflect API but the compiler does not care.

If you want to learn more, you can set a breakpoint in SpecificCompiler line 372, Avro 1.7.7. You can see that before the call to addStringType() the schema have the required information in the props field. If you pass this schema to SpecificCompiler.javaType() then it will do what you want. But then addStringType replaces your schema by a static one. I will most likely ask the question on the mailing list since I don't see the point.

Clément MATHIEU
  • 3,030
  • 23
  • 25
  • 1
    Is there some docs on this? I am looking how I can specify array type. I want it to be Set rather than `java.util.list`, but can't find any docs :/ – eddyP23 Oct 10 '17 at 12:18
  • it works with 1.8.2 as well. Cheers @Clément MATHIEU – aurelius Feb 01 '18 at 13:37
8

You can set it per field level, just change the type to an object, and include "type" : "string" and "avro.java.string" : "String"

See below for example:

{
    "type": "record",
    "name": "test",
    "fields": [
        {
            "name": "name",
            "type": {
                "type": "string",
                "avro.java.string": "String"
            }
        }
    ]
}
Belphegor
  • 4,456
  • 11
  • 34
  • 59
mnouh1
  • 234
  • 3
  • 9
  • 1
    That is not working. We have avro file defined in that way, and still we get CharSequence instead of String. Mathieu solution with maven configuration works fine in our case. – Son May 07 '20 at 10:31