0

I am looking on the wikipedia article for hash trees, and I am slightly confused by their diagram.

A leaf node obviously contains the hash of the underlying data.

Are leaf nodes in hash trees different than any non-leaf node? Do non-leaf nodes contain hashes of data, or hashes of hashes?

Given this diagram:

Hash tree diagram

Which of these is Hash 1 a hash of?

  1. Hash 1-0 + Hash 1-1
  2. Data block 002 + Data block 003

Or are hash trees fundamentally different depending on the application (rsync, P2P networks, Git, etc)?

Merlyn Morgan-Graham
  • 58,163
  • 16
  • 128
  • 183

1 Answers1

1

This is what wiki article says:

Nodes further up in the tree are the hashes of their respective children. For example, in the picture hash 0 is the result of hashing hash 0-0 and then hash 0-1. That is, hash 0 = hash( hash 0-0 || hash 0-1 ) where || denotes concatenation.

But I truly believe that a developer may customize the tree and algorithm, use different hash functions and so on, optimizing it for different data or speed or memory or whatever.

Lyth
  • 2,171
  • 2
  • 29
  • 37
  • +1 for noticing the part of the article that I missed. I'd like to figure out how it is most commonly implemented, and that gives me a push in the right direction. I don't know much about the choices of hash algorithms, and the sizes of their output hash data, but I have a feeling that what the article describes will generally iterate over less data, so will have better perf. I'm going to hold out for a more authoritative answer, mentioning specific math, perf, and/or the specific choices made by popular software, but I'll come back to this answer if no one else pipes up :) – Merlyn Morgan-Graham Dec 07 '11 at 07:22