9

I am new to AVRO and please excuse me if it is a simple question. I have a use case where I am using AVRO schema for record calls.

Let's say I have avro schema

{
    "name": "abc",
    "namepsace": "xyz",
    "type": "record",
    "fields": [
        {"name": "CustId", "type":"string"},
        {"name": "SessionId", "type":"string"},
     ]
}

Now if the input is like

{
    "CustId" : "abc1234"
    "sessionID" : "000-0000-00000"
}

I want to use some regex validations for these fields and I want take this input only if it comes in particular format shown as above. Is there any way to specify in avro schema to include regex expression?

Any other data serialization formats which supports something like this?

user2166328
  • 199
  • 2
  • 2
  • 6

1 Answers1

8

You should be able to use a custom logical type for this. You would then include the regular expressions directly in the schema.

For example, here's how you would implement one in JavaScript:

var avro = require('avsc'),
    util = require('util');

/**
 * Sample logical type that validates strings using a regular expression.
 *
 */
function ValidatedString(attrs, opts) {
  avro.types.LogicalType.call(this, attrs, opts);
  this._pattern = new RegExp(attrs.pattern);
}
util.inherits(ValidatedString, avro.types.LogicalType);

ValidatedString.prototype._fromValue = function (val) {
  if (!this._pattern.test(val)) {
    throw new Error('invalid string: ' + val);
  }
  return val;
};

ValidatedString.prototype._toValue = ValidatedString.prototype._fromValue;

And how you would use it:

var type = avro.parse({
  name: 'Example',
  type: 'record',
  fields: [
    {
      name: 'custId',
      type: 'string' // Normal (free-form) string.
    },
    {
      name: 'sessionId',
      type: {
        type: 'string',
        logicalType: 'validated-string',
        pattern: '^\\d{3}-\\d{4}-\\d{5}$' // Validation pattern.
      }
    },
  ]
}, {logicalTypes: {'validated-string': ValidatedString}});

type.isValid({custId: 'abc', sessionId: '123-1234-12345'}); // true
type.isValid({custId: 'abc', sessionId: 'foobar'}); // false

You can read more about implementing and using logical types here.

Edit: For the Java implementation, I believe you will want to look at the following classes:

mtth
  • 4,671
  • 3
  • 30
  • 36
  • Is this a feature unique to the javascript library? – sksamuel May 19 '16 at 11:08
  • The Java implementation also already supports logical types. They were introduced relatively recently in the spec but should hopefully be in most implementations soon. – mtth May 19 '16 at 12:42
  • That is awesome example. Is Java implementation released now? Can you point me to the javadocs if possible? Thanks in Advance – user2166328 May 20 '16 at 00:36
  • Sure @user2166328; I edited my answer with a few links. The Java implementation has been released (`1.8.0+`). – mtth May 20 '16 at 15:28
  • thank you for the links. that answered my questions. :) – user2166328 May 20 '16 at 21:40
  • 1
    can you please provide this approach in java implementation for validation using regular expression by Custom LogicalType because i didnt find any resources or relevant information – Santhosh Feb 22 '17 at 15:00