10

I'm writing code to decode messages from a binary protocol. Each message type is assigned a 1 byte type identifier and each message carries this type id. Messages all start with a common header consisting of 5 fields. My API is simple:

decoder:decode(Bin :: binary()) -> my_message_type() | {error, binary()}`

My first instinct is to lean heavily on pattern matching by writing one decode function for each message type and to decode that message type completely in the fun argument

decode(<<Hdr1:8, ?MESSAGE_TYPE_ID_X:8, Hdr3:8, Hdr4:8, Hdr5:32, 
         TypeXField1:32, TypeXFld2:32, TypeXFld3:32>>) ->
    #message_x{hdr1=Hdr1, hdr3=Hdr3 ... fld4=TypeXFld3};

decode(<<Hdr1:8, ?MESSAGE_TYPE_ID_Y:8, Hdr3:8, Hdr4:8, Hdr5:32, 
         TypeYField1:32, TypeYFld2:16, TypeYFld3:4, TypeYFld4:32
         TypeYFld5:64>>) ->
    #message_y{hdr1=Hdr1, hdr3=Hdr3 ... fld5=TypeYFld5}.

Note that while the first 5 fields of the messages are structurally identical, the fields after that vary for each message type.

I have roughly 20 message types and thus 20 functions similar to the above. Am I decoding the full message multiple times with this structure? Is it idiomatic? Would I be better off just decoding the message type field in the function header and then decode the full message in the body of the message?

mpm
  • 1,066
  • 9
  • 23
  • Meta question: I get nice erlang code coloring in the preview when I edit this post, but not in the rendered page after I post. Help? – mpm Apr 28 '11 at 02:40
  • noticed this too when posting erlang code, might be a question for meta – Peer Stritzinger Apr 28 '11 at 11:08
  • 1
    Posted a question on meta regarding the syntax coloring http://meta.stackexchange.com/questions/89117/why-do-i-get-nice-erlang-syntax-coloring-in-preiviev-but-not-in-the-rendered-page – Peer Stritzinger Apr 28 '11 at 14:11

2 Answers2

8

Just to agree that your style is very idiomatic Erlang. Don't split the decoding into separate parts unless you feel it makes your code clearer. Sometimes it can be more logical to do that type of grouping.

The compiler is smart and compiles pattern matching in such a way that it will not decode the message more than once. It will first decode the first two fields (bytes) and then use the value of the second field, the message type, to determine how it is going to handle the rest of the message. This works irrespective of how long the common part of the binary is.

So their is no need to try and "help" the compiler by splitting the decoding into separate parts, it will not make it more efficient. Again, only do it if it makes your code clearer.

rvirding
  • 20,848
  • 2
  • 37
  • 56
  • Just what I needed to hear in re: compiler optimizations. The code is indeed clearer as it is currently structured. Thanks, Rob. – mpm Apr 29 '11 at 00:00
7

Your current approach is idiomatic Erlang, so keep going this direction. Don't worry about performance, Erlang compiler does good work here. If your messages are really exactly same format you can write macro for it but it should generate same code under hood. Anyway using macro usually leads to worse maintainability. Just for curiosity why you are generating different record types when all have exactly same fields? Alternative approach is just translate message type from constant to Erlang atom and store it in one record type.

Hynek -Pichi- Vychodil
  • 26,174
  • 5
  • 52
  • 73
  • 1
    I second that. If you didn't simplify the structure for the post and the records all have the same format, use atoms to differentiate between the record-types - one function and one erlang-record definition. – Tom Regner Apr 28 '11 at 14:24
  • 1
    I'm sorry - I over-simplified in my code. After the first 5 fields, the messages are dissimilar. I have updated the code example to indicate that the two messages, TypeX and TypeY, have a different number of fields and different field structures. Sorry for the confusion. – mpm Apr 28 '11 at 14:58
  • 1
    @mpm: So then your code is perfectly matching Elang idioms. For reader of code is usually harder follow code flow "match this and then match this and then this" than look here is one type of message and another type of message. It is even better for maintenance in long terms. After very early stage of project you will be reading more and more patches and initial assumptions can change drastically. What is common for message today could not be in moths or years and each such change will cause a more of code change to cascade matching than your current approach. – Hynek -Pichi- Vychodil Apr 28 '11 at 20:29