1

I have a Schema which is written in JSON format. And I get a string from kafka server which looks like:

\0\0\0\u00032H45d71580-9781-4d9c-8535-a233ff7c3122\nPLANTH45d71580-9781-4d9c-8535-a233ff7c3122\nPLANT,2017-12-12T16:34:15GMT\u001020171212\u0018201712121034\nthertH1AB5297A-9D28-4742-A95C-4A4CEED7037D\nfalse\nfalse\ncross\u00021\u00025

Now I try to deserialize the string and make it to a Object based on my Schema file. How can I do that in c#? Is there any library I can use?

I tried Microsoft.Hadoop.Avro. https://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-dotnet-avro-serialization#Scenario1 Once the code run to:

var actual = avroSerializer.Deserialize(buffer);

it will throw a exception: "array dimensions exceeded supported range"

I get the string from kafka. Another app produce it and my app consume it. The app produce it is written in swift and they use some nodejs lib to do serialize. So I guess if the string's format matter?

The kafka message is produced by a Javascript app. They serialize the string by using a Library called AVSC (Avro for Javascript). Once I get the message (a string) I convert it into a byte stream, after that I found this byte is a little bit different than the original one generated by AVSC lib. But why?

Meng Tim
  • 408
  • 1
  • 6
  • 19
  • Found with Google: [Serializing data with the Microsoft .NET Library for Avro](https://msdn.microsoft.com/en-us/library/dn749865.aspx) and also https://www.nuget.org/packages/Microsoft.Hadoop.Avro. However, for better or worse, questions asking for tool or library recommendations are [off topic](https://meta.stackoverflow.com/q/282983/3744182) on StackOverflow. – dbc Dec 13 '17 at 20:56
  • Also check [Deserialize an Avro file with C#](https://stackoverflow.com/q/39846833/3744182). – dbc Dec 13 '17 at 21:02
  • I checked with all of these before I post the question. Neither of these works. Microsoft hadoop avro does not really work... – Meng Tim Dec 14 '17 at 22:12

3 Answers3

2

Confluent's Java library (which I suspect is what the Swift app is using to write to Kafka) writes a magic byte when they serialize to Avro's binary encoding. See this article: https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html#wire-format

They use it for versioning and backwards compatibility, which is detailed here: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets

However, the Microsoft.Hadoop.Avro library you are using does not use a magic byte when it de/serializes. Try removing the first byte from the stream before calling Deserialize().

ngprice
  • 156
  • 2
  • 8
  • Adding to this, if you're using schema registry it also appends 4 bytes after the magic byte, which are the int32 of the schema ID. – ngprice Feb 08 '18 at 20:36
  • I tried to remove the first byte, and also the first 5 bytes, but it still gives me the same error. I check the bytes array, it seems the bytes post by swift is in Hexadecimal, but the bytes we get from kafka is in decimal. – Meng Tim Feb 12 '18 at 17:26
  • Hex and decimal are encodings of a byte array; the underlying bytes are the same no matter how they are represented. Kafka is merely a store datastore, so if your Swift app is writing a hex encoded string, you must get a byte array from that, try this: [How do you convert a byte array to a hexadecimal string, and vice versa](http://stackoverflow.com/questions/321370/how-can-i-convert-a-hex-string-to-a-byte-array) – ngprice Feb 13 '18 at 18:54
2

You could try to use https://github.com/AdrianStrugala/AvroConvert
If the file from Kafka contains only data use:

var actual = AvroCnvert.DeserializeHeadless<TheModel>(buffer, schema);

You need to be sure that your model and schema are correct.

Avro is a data format (exactly like JSON). Every serializer implementation (or language) should be compatible with each other.

סטנלי גרונן
  • 2,917
  • 23
  • 46
  • 68
Adrian
  • 96
  • 5
0

You should try Microsoft.Hadoop.Avro.Container.AvroContainer this has a method CreateGenericReader. Something like;

using (var reader = AvroContainer.CreateGenericReader(buffer))
        {
            while (reader.MoveNext())
            {
                foreach (dynamic record in reader.Current.Objects) {
                      // Take a look at what you get in the record
                }
            }
        }

The Nuget package is Microsoft.Avro.Tools (v0.1.0 in .Net Core)

acart
  • 61
  • 6
  • I just tried this. It does not work. It will throw a exception:"In valid avro object container in a stream." I am now guessing the string I get from kafka is not in the proper format... – Meng Tim Dec 21 '17 at 22:03