2

I've just read this great article about boost::serialization and it seems to be very clever and easy to use.

I'm searching for a way to calculate the hash-value of an object graph in a similar/easy way.

Does anybody know how this can be achieved?

keyboard
  • 2,137
  • 1
  • 20
  • 32
  • A lot depends on the graph library you're using. If you're able to query the graph and get a consistently ordered traversal through any same-valued/wired graph AND some representation of the edges/connection, then a combination of hash functions and `boost::hash_combine` can produce an overall hash. For example, lets call your objects in ordered traversal O0, O1, O2 and O3, and edges 0-1, O-3, 1-2: you can `hash_combine` `{ hash(O0), hash(O1), hash(O2), hash(O3), hash(0), hash(1), hash(0), hash(3), hash(1), hash(2) }` (or even skip the hashing of indices if you trust `hash_combine` enough). – Tony Delroy Apr 01 '16 at 11:17
  • @TonyD that's the tedious approach which also hardcodes the hash algorithm into the implementation. I don't think that's what he's after. The "smart" he detects would be the "Types Don't Know #" idea I linked in my answer, I think – sehe Apr 01 '16 at 12:06
  • @sehe: I can't imagine why you think doing the above requires any hardcoding of the hash algorithm... can trivially provide say `template size_t hash(const Graph& g)` that invokes a la `Hash()(*graph_iterator)` and `Hash()(size_t)`, defaulting to `std::hash`-es if convenient. Not sure why you call it tedious either: pretty damned simple... might well be around 10 lines of simple code, depending on the graph library. – Tony Delroy Apr 01 '16 at 12:15
  • I'm not using a graph library, I'm speaking about an object graph. Hence, I have an object which has a couple of primitive members and a couple of pointers to other objects. All the classes/objects used here are custom. I think what @sehe is after is closer to what I'm looking for, though I have still some trouble understanding his answer. ;) – keyboard Apr 01 '16 at 13:11

1 Answers1

1

I did exactly that once:

Also required reading: Types Don't Know # with the talk on YT: https://www.youtube.com/watch?v=Njjp_MJsgt8

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • I have to admit, I'm quite new to boost and also have only three months of experience with C++. Your answer looks very smart indeed, but I have trouble deciphering it. If you could explain it in an easier way, it'd be appreciated. In the meantime, I'm going to keep trying to understand. – keyboard Apr 01 '16 at 13:15
  • 1
    It goes half-way by just serializing to a binary-archive, which is written to a fake stream (you could use `boost::iostreams::tee` to have the ahs _and_ the serialized data). The fake stream just hashes. Of course you could plug MD5, SAH1, Blake2, Fletcher2, Murmur or anything there. – sehe Apr 01 '16 at 13:57
  • I call it half-way because it would be cleaner to have a custom archive. Oor you can not "abuse" Boost Serialization to do the dispatching and roll your own, e.g.: https://stackoverflow.com/questions/27523771/how-to-pass-class-template-argument-to-boostvariant/27526407#27526407 I'd recommend against it if you need polymorphism and support for many things that Boost Serialization comes with (containers, strings, multi-index containers, multiprecision numbers, smart pointers etc.) – sehe Apr 01 '16 at 14:01
  • Actually my basic needs are pointers (hash the objects behind them), strings, containers. So to get this right: You kind of suggest just serializing everything with basic boost::serialization and then calculating the hash value of the binary result? (Probably a bit smarter, with a stream (?), but that's what I understood) – keyboard Apr 01 '16 at 14:03
  • Wouldn't that be quite slow? I suppose serialization to take quite some more time than hashing. – keyboard Apr 01 '16 at 14:04
  • Yes that would be slow, which is why I avoid that. By using binary serialization you get bitwise copies for 90% of things with a tiny bit of overhead. By using a "fake stream" (that doesn't actually allocate any buffer, just feeds the bytes to a hash) you eliminate the cost of the stream. Since this uses compile-time template instantiations, the compiler can see through the templates and after inlining the code will typically be similarly efficient as handwritten code. (Profile, profile, profile) – sehe Apr 01 '16 at 14:06
  • Regarding "tiny overhead" I refer to things like length bytes or type ids in the case of _polymorphic_ pointers. These are actually important to avoid predictable hash collisions on unrelated graphs with identical binary representations (e.g. the classic hash("aa", "bb") != hash("bb", "aa") or hash("", "aabb")) – sehe Apr 01 '16 at 14:09
  • Okay, I understand that. Now about the stuff I'm not getting yet: You say that I can just prepare my classes for boost::serialization and without adding anything extra to them, I can directly hash the serialization results? I'm starting to kind of understand the code in the other thread. I've also read and understood the whole paper in the meantime. But I'm still struggling with your code ;) Maybe the whole multiprecision stuff is confusing me. – keyboard Apr 01 '16 at 14:10
  • Yeah, the multiprecision stuff is /just/ a sample object graph in that question. If you make a SSCCE, I'm happy to make a small demo (later) – sehe Apr 01 '16 at 14:12
  • 1
    Okay, thanks a lot. I will now try to apply what I learned and make the SSCCE. Maybe I can even get it to work myself. You've brought me a whole lot further. Thanks a ton! And maybe see you later :) – keyboard Apr 01 '16 at 14:13
  • Heyho, it's me again. I'm still unable to even get a demo project with boost to work. I was hoping that serialization could be a header-only include and "just works", but it seems not to be the case. Having to compile boost for all possible platforms and having the dependency on it, is kind of overkill for my app. Do you know a way how I could get boost-style serialization to work without having to compile a library for it? – keyboard Apr 01 '16 at 17:16
  • There's Cereal (I think it has a library component too). You can link statically. And [there's the linked example](http://stackoverflow.com/questions/36353643/c-hash-value-of-object-graph-similar-to-boostserialization/36355910?noredirect=1#comment60336281_36355910). – sehe Apr 01 '16 at 18:35
  • I'm actually just looking at it, but it doesn't offer the cool polymorphic pointer solution for raw pointers. I'm just using an object-tree (so no cycles or anything). Do you have an idea how to "get this back" in cereal? Do you think that this technique of abusing the serialization would work for cereal just the same? – keyboard Apr 01 '16 at 18:38
  • That's what you will write yourself. Since you don't require deserialization it'll be much simpler. Good luck – sehe Apr 01 '16 at 19:15