7

I am trying to write a very easy avro schema (easy because I am just pointing out my current issue) to write an avro data file based on data stored in json format. The trick is that one field is optional, and one of avrotools or me is not doing it right.

The goal is not to write my own serialiser, the endgoal will be to have this in flume, I am in the early stages.

The data (works), in a file named so.log:

{
  "valid":  {"boolean":true}
, "source": {"bytes":"live"}
}

The schema, in a file named so.avsc:

{
  "type":"record",
  "name":"Event",
  "fields":[
      {"name":"valid", "type": ["null", "boolean"],"default":null}
    , {"name":"source","type": ["null", "bytes"],"default":null}
  ]
}

I can easily generate an avro file with the following command:

java -jar avro-tools-1.7.6.jar fromjson --schema-file so.avsc so.log

So far so good. The thing is that "source" is optional, so I would expect the following data to be valid as well:

{
  "valid": {"boolean":true}
}

But running the same command gives me the error:

Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got END_OBJECT
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)
at org.apache.avro.tool.Main.run(Main.java:84)
at org.apache.avro.tool.Main.main(Main.java:73)

I did try a lot of variations in the schema, even things that do not follow the avro spec. The schema I show here is, as far as I know, what the spec says it should be.

Would anybody know what I am doing wrong, and how I can actually have optional elements without writing my own serialiser?

Thanks,

Guillaume
  • 2,325
  • 2
  • 22
  • 40
  • Seeing the same issue here, did you find any way to work with the AvroTools to convert a JSON with optional fields to Avro? The only workaround I can think of at the moment is to write a wrapper that would insert default values in the JSON before conversion, but that's a shame... – snooze92 Sep 08 '14 at 12:28
  • 1
    Sadly I had not much luck. In short JSON is for convenience only, and although a schema can have a default value, a JSON document missing this value is actually not valid. I documented some [problems and solutions we had with avro](http://thisdwhguy.com/2014/10/27/avro-end-to-end-in-hdfs-part-4-problems-and-solutions/), hope it can help. – Guillaume Nov 03 '14 at 08:38
  • Thanks for the answer, I will read your blog article! We have given up on JSON > Avro conversion for now. – snooze92 Nov 04 '14 at 09:41
  • 2
    Avro documentation says using "a builder requires setting all fields, even if they are null". May be it is related to your case. Check https://avro.apache.org/docs/1.7.7/gettingstartedjava.html – haltunbay Aug 20 '15 at 11:17
  • See http://stackoverflow.com/questions/27485580/how-to-fix-expected-start-union-got-value-number-int-when-converting-json-to-av – Pavel Bernshtam Aug 08 '16 at 07:13

1 Answers1

5

According to the documentation of the Java API:

using a builder requires setting all fields, even if they are null

The python API, on the other hand, seems to allow null fields to be really optional:

Since the field favorite_color has type ["string", "null"], we are not required to specify this field

In short, as most tools are written Java, null fields must usually be explicitly given.

Guillaume
  • 2,325
  • 2
  • 22
  • 40