0

I need to serialize a (possibly complex *) object so that I can calculate the object's MAC**.

If your messages are strings you can simply do tag := MAC(key, string) and with very high probability if s1 != s2 then MAC(key, s1) != MAC(key, s2), moreover it is computationally hard to find s1,s2 such that MAC(k,s1) == MAC(k,s2).

Now my question is, what happens if instead of strings you need do MAC a very complex object that can contain arrays of objects and nested objects:

JSON
Initially I though that just using JSON serialization could do the trick but it turns out that JSON serializers do not care about order so for example: {b:2,a:1} can be serialized to both {"b":2,"a":1} or {"a":2,"b":1}.

URL Params
You can convert the object to a list of url query params after sorting the keys, so for example {c:1,b:2} can be serialized to b=2&c=1. The problem is that as the object gets more complex the serialization becomes difficult to understand. Example: {c:1, b:{d:2}}
1. First we serialize the nested object:{c:1, b:{d=2}} 2. Then url encode the = sign: {c:1, b:{d%3D2}} 3. Final serialization is: b=d%3D2&c=1

As you can see, the serialization quickly becomes unreadable and though I have not proved it yet I also have the feeling that it is not very secure (i.e. it is possible to find two messages that MAC to the same value)

Can anyone show me a good secure*** algorithm for serializing objects?

[*]: The object can have nested objects and nested arrays of objects. No circular references allowed. Example: {a:'a', b:'b', c:{d:{e:{f:[1,2,3,4,5]}}, g:[{h:'h'},{i:'i'}]}}

[**]: This MAC will then be sent over the wire. I cannot know what languages/frameworks are supported by the servers so language specific solutions like Java Object Serialization are not possible.

[***]: Secure in this context means that given messages a,b: serialize(a) = serialize(b) implies that a = b

EDIT: I just found out about the SignedObject through this link. Is there a language agnostic equivalent?

fernandohur
  • 7,014
  • 11
  • 48
  • 86
  • If you want an unqiue serialization to calculate MACs, why do you care if it´s readability is good? (And no, you can´t calulate MACs from Strings, only if you specify a charset too) – deviantfan Mar 18 '15 at 16:09
  • And, if the MAC algo works fine and secure for strings, why shouldn´t it work for strings with a specific content (JSON etc.) ? – deviantfan Mar 18 '15 at 16:11
  • @deviantfan you are right, readability is not important – fernandohur Mar 18 '15 at 18:41

1 Answers1

0

What you are looking for is a canonical representation, either for the data storage itself, or for pre-processing before applying the MAC algorithm. One rather known format is the canonicalization used for XML-signature. It seems like the draft 2.0 version of XML signature is also including HMAC. Be warned that creating a secure verification of XML signatures is fraught with dangers - don't let yourself be tricked into trusting the signed document itself.

As for JSON, there seems to be a canonical JSON draft, but I cannot see the status of it or if there are any compliant implementations. Here is a Q/A where the same issue comes up (for hashing instead of a MAC). So there does not seem to be a fully standardized way of doing it.

In binary there's ASN.1 DER encoding, but you may not want to go into that as it is highly complex.


Of course you can always define your own binary or textual representation, as long as there is one representation for data sets that are semantically identical. In the case of an textual representation, you will still need to define a specific character encoding (UTF-8 is recommended) to convert the representation to bytes, as HMAC takes binary input only.

Community
  • 1
  • 1
Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263