5

I'm trying to understand how avro's logicaltypes were supposed to be used. First let me give an example about what I'm trying to achieve; I wanna write a new Logical Type (RegExLogicalType) that validates an input string and either accept it or raise some Exception.

or let's speak about one of the existing supported avro's logical types (decimal) I was expecting to use it in this way:

  1. If invalid decimal logical type is specified an exception must be raised; something like when mandatory field was expected but nothing has been provided org.apache.avro.AvroRuntimeException: Field test_decimal type:BYTES pos:2 not set and has no default value
  2. If a valid decimal logical type is specified no Exception should be raised.

what I have found in the documentation is only speaking about reading/de-serialization and I don't know what about writing/serialization

Language implementations must ignore unknown logical types when reading, and should use the underlying Avro type. If a logical type is invalid, for example a decimal with scale greater than its precision, then implementations should ignore the logical type and use the underlying Avro type.

I don't want the above mention behavior for the serialization/de-serialization I need to have something equivalent to XSD restrictions (patterns) that is used to validate the data against the schema

here in avro if the schema is as follows

{"namespace": "com.stackoverflow.avro",
 "type": "record",
 "name": "Request",
 "fields": [
     {"name": "caller_jwt",  "type": "string", "logicalType": "regular-expression", "pattern": "[a-zA-Z0-9]*\\.[a-zA-Z0-9]*\\.[a-zA-Z0-9]*"},
     {"name": "test_decimal", "type": "bytes", "logicalType": "decimal",  "precision": 4,  "scale": 2}
 ]
}

and if I tried to build an object and serialize it like:

DatumWriter<Request> userDatumWriter = new SpecificDatumWriter<>(Request.class);
DataFileWriter<Request> dataFileWriter = new DataFileWriter<>(userDatumWriter);

ByteBuffer badDecimal = ByteBuffer.wrap("bad".getBytes());

Request request = Request.newBuilder()
            .setTestDecimal(badDecimal) // bad decimal
            .setCallerJwt("qsdsqdqsd").build(); // bad value according to regEx
dataFileWriter.create(request.getSchema(), new File("users.avro"));
dataFileWriter.append(dcCreationRequest);
dataFileWriter.close();

no exception is thrown and the object is serialized to users.avro file

so I don't know if avro's logical types could be used to validate input data? or there is something else that could be used to validate input data?

Ali Abdel-Aziz
  • 275
  • 1
  • 4
  • 13
  • To specify your own `logical type` you'll need to derive it from the base logical type. Have a **[look here](http://stackoverflow.com/questions/37279096/data-validation-in-avro)** for some great resources – JSteward May 10 '17 at 16:25

0 Answers0