1

Is there away I can send and receive Date type with Apache Avro. I have not been able to find anything on this. Only things I found said that use int and logicalType of Date in schema. But that results in another int on the receiver side. I still have to convert it to date.

I am trying to send date from a Apache Kafka producer and receive in the Kafka consumer.

If there is not other way then do I have to convert date to int always and then back at the consumer. There is this article which shows how to do it:

Get the number of days, weeks, and months, since Epoch in Java

Serializer code:-

@Override
    public byte[] serialize(String topic, T data) {
        try {
            byte[] result = null;

            if (data != null) {
                logger.debug("data='{}'" +  data);

                ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
                BinaryEncoder binaryEncoder =
                        EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);

                DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(data.getSchema());
                datumWriter.write(data, binaryEncoder);

                binaryEncoder.flush();
                byteArrayOutputStream.close();

                result = byteArrayOutputStream.toByteArray();
                byteArrayOutputStream.close();
                logger.debug("serialized data='{}'" +  DatatypeConverter.printHexBinary(result));
            }
            return result;
        } catch (IOException ex) {
            throw new SerializationException(
                    "Can't serialize data='" + data + "' for topic='" + topic + "'", ex);
        }
    }

desirializer code:-

    @Override
    public T deserialize(String topic, byte[] data) {
        try {
            T result = null;

            if (data != null) {
                logger.debug("data='{}'" + DatatypeConverter.printHexBinary(data));

                DatumReader<GenericRecord> datumReader =
                        new SpecificDatumReader<>(targetType.newInstance().getSchema());
                Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);

                result = (T) datumReader.read(null, decoder);
                logger.debug("deserialized data='{}'" + result);                
            }
            return result;
        } catch (Exception ex) {
            throw new SerializationException(
                    "Can't deserialize data '" + Arrays.toString(data) + "' from topic '" + topic + "'", ex);
        }
    }

Schema file:-

{"namespace": "com.test",
  "type": "record",
  "name": "Measures",
  "fields": [  
    {"name": "transactionDate", "type": ["int", "null"], "logicalType" : "date" }
   ]
}

and these two are just defined as serializer and deserializer classes in producer and consumer configuration.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Atul Ojha
  • 146
  • 2
  • 4
  • 11
  • What do you mean by "_an `int` on the receiver side_"? Your Java type that you deserialize into should have a `Date` field that Avro can populate. I would also strongly recommend against using `Date` - use an `Instant` if you need a point in time. – Boris the Spider Mar 02 '18 at 20:42
  • Your schema is wrong - the logical type goes on the type not on the field. `{ "type": "long", "logicalType": "date" }` – Boris the Spider Mar 02 '18 at 21:58
  • Any reason you're using your own decoders? Confluent provides their own https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html – OneCricketeer Mar 03 '18 at 03:03
  • thank you @BoristheSpider my schema was wrong, after correcting it and using adapter after reading Basil's response below, I could get it working as joda LocalDate. I want to avoid confluent for as long as I can, no reason though. – Atul Ojha Mar 06 '18 at 10:02

1 Answers1

0

I have not used Apace Avro nor Apache Kafka, but perhaps this will help…

Is there away I can send and receive Date type with Apache Avro

Looking at the Wikipedia page, there is no Date type defined in Avro:

Avro schemas are defined using JSON. Schemas are composed of primitive types (null, boolean, int, long, float, double, bytes, and string) and complex types (record, enum, array, map, union, and fixed).

JSON also lacks date-time types.

ISO 8601

In such a case where no date-time support is offered, I suggest serializing date-time values to text using the standard ISO 8601 formats. These formats are designed to be practical: easy to parse by machine, and easy to read by humans across cultures while avoiding ambiguity.

For a date-only value, the format would be YYYY-MM-DD. January 23rd, 2018 would be 2018-01-23.

java.time

The java.time classes use ISO 8601 formats by default when parsing/generating strings.

The LocalDate class represents a date-only value without time-of-day and without time zone.

LocalDate.of( 2018 , Month.JANUARY , 23 )
         .toString()                              // Generating a string in standard format.

2018-01-23

LocalDate ld = LocalDate.parse( "2018-01-23" ) ;  // Parsing a string in standard format.

Count-from-epoch

I do not recommend tracking date-time values a count from epoch reference. But if you decide to go that way, the java.time classes can assist.

The epoch reference date of 1970-01-01 is defined as the constant LocalDate.EPOCH.

Get the number of days since that epoch reference.

long daysSinceEpoch = ld.toEpochDay() ;

17554

Parse the number of days since epoch. Adding 17,554 days to 1970-01-01 results in 2018-01-23.

LocalDate ld = LocalDate.ofEpochDay( 17_554L ) ;  // 1970-01-01 + 17,554 days = 2018-01-23

You can see why I do not recommend this count-from-epoch approach: Reading and debugging 2018-01-23 is much easier than deciphering 17554.

Joda-Time

Apache Avro includes an adapter class for Joda-Time types (ticket AVRO-1672). I do not know if such an adapter is built for the java.time types yet.

The Joda-Time project was the precursor to the java.time framework built into Java. The project is now in maintenance-mode, with the authors advising migration to java.time classes.


About java.time

The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, Calendar, & SimpleDateFormat.

The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.

To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.

You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.* classes.

Where to obtain the java.time classes?

The ThreeTen-Extra project extends java.time with additional classes. This project is a proving ground for possible future additions to java.time. You may find some useful classes here such as Interval, YearWeek, YearQuarter, and more.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • 1
    Not quite right - an Avro 'logicalType' adds a conversion layer between the Java type and the wire protocol. It basically says "this `long` is a date". – Boris the Spider Mar 02 '18 at 20:40
  • it does say so, that is what document say, but how do you suggest it deserializes it to date, it still get maps to same long field as a long and so would need another conversion to get date out of long ? – Atul Ojha Mar 02 '18 at 20:53
  • Avro does that conversion for you @AtulOjha - the schema tells it that it needs to. It's not clear to me why/how you are accessing the underlying data in the Avro protocol. – Boris the Spider Mar 02 '18 at 21:02
  • I have a field transactionDate. I convert the date using joda date time library to days since epoch on the Kafka producer side and then when on the consumer side I do, getTransactionDate(), I get the same number back(days from epoch instead of a date). In schema file it is defined as:- {"name": "transactionDate", "type": ["int", "null"], "logicalType" : "date" } – Atul Ojha Mar 02 '18 at 21:04
  • Why are you doing _any_ conversion at _either_ end? Your Java representation should have a `Date` and you should let Avro do conversion. I really don’t understand your question @AtulOjha. I would suggest adding an MVCE to the question with a very minimal reader and writer schema as well as supporting objects. – Boris the Spider Mar 02 '18 at 21:40
  • and the domain class Measure(in schema) is auto generated using maven Avro plugin. – Atul Ojha Mar 02 '18 at 21:56
  • 1
    @AtulOjha did you find a solution for this? I'm facing the same case, autogenerating the dto from the Avro schema the getter field returns an int instead of a Date. – codependent Jul 16 '19 at 06:13