How do we know a Merkle Proof contains legitimate hashes?

Question

I'm trying to understand merkle trees in the context of blockchain and am not understanding how it helps with transaction verification (verifying if a transaction is inside a block).

The typical explanation is that merkle trees enable a SPV (simple payment verification) client to verify that a transaction is contained inside a block via a Merkle Proof (a list of hashes of the transaction branch). This reduces the number of hashes needed to be verified to Log N (where N = # transactions in block). Without a Merkle Tree, one would need to download all N transactions inside the block to compute and verify the block hash.

But how do we know the hashes inside the Merkle Proof are legitimate? Wouldn't it be simple to just create a bunch of fake hashes that hash to the same root hash? Since the SPV client / verifier doesn't have the transaction data and can't verify whether the hashes contained within the Merkle Proof are correct, I don't see how anything is actually being verified here.

score 0 · Answer 1 · answered Mar 24 '23 at 04:59

Merkle Tree is a binary tree in which every node has at most 2 child nodes. Inputs of Merkle tree is placed at the leaves (from bottom)

Starting from leaves, all the transactions in the block get hashed two at a time and the new hash is stored as the parent of those 2 transaction hashes. This process is carried out till the root hash. Merkle root is stored in the block header. (although image has even number of hashes, in case of odd number transactions, last transaction hash is duplicated)

Wouldn't it be simple to just create a bunch of fake hashes that hash to the same root hash?

If you change any transaction hash or if you add a fake transaction hash to the tree, you will get a different Merkle root from the one stored in the block header.

Since the SPV client / verifier doesn't have the transaction data and can't verify whether the hashes contained within the Merkle Proof are correct, I don't see how anything is actually being verified here.

I think this is the main reason why you posted this question. if SPV client does not have transaction data, how come it can verify the transaction?

It uses bloom filters. It first establishes a bloom filter. I explained bloom filters here. From spvs-and-bloom-filters-in-bitcoin-protocol/

As mentioned, SPV nodes use bloom filters to filter transactions and block an SPV node receives from peers by selecting only transactions of interest to an SPV node without revealing addresses or keys the SPV is interested in. An SPV node initializes a bloom filter as 'empty' meaning the filter won't match any patterns. The SPV node makes a list of keys, addresses, and hashes it is interested in by extracting the public key hash, script hash, and transaction IDs from UTXOs controlled by a wallet. Each of these pieces of data is added to a bloom filter which matches the patterns present in a transaction without revealing the patterns themselves.

The SPV node sends the bloom filter to a full node with the search parameters that checks several parts of a transaction against the loom filter looking for matches. After a filter is established, the peer tests each transaction output against the bloom filter, and only matching transactions are sent back to the SPV node. The SPV node discards false positives and uses the correct transaction to update its UTXO set. Additionally, it modifies the bloom filter to match future transactions that reference the found UTXO. A full node uses the new bloom filter to match new transaction references in the search parameters and the process repeats itself.

SPVs use Merkle trees extensively because they cannot store the full copy of the blockchain. For an SPV to verify a transaction, it uses an authentication path or Merkle path from the Merkle tree. For example, an SPV node that is receiving incoming funds establishes a bloom filter on all its connections to limit the number of transactions received to only those that have the set filter. When a peer fids the transaction matching the filter, it sends the block which has the merkle block message containing the block header and the Merkle path that links the transaction the SPV is interested in the root of the merkle tree inside the block. The SPV node uses the merkle tree to connect the transaction to the block and verify that it is included in the block. This SPV node also uses the block header to link the block to the blockchain. These two links(transaction-block, block-blockchain) when combined prove that the transaction exists in the blockchain. In conclusion, the SPV node will receive less than a kilobyte of data for the block header and merkle path compared to the full copy of the blockchain.

The steps can be summarized algorithmically as follows;

1- The SPV client request Merkle Branch from the full node for transactions.

2- SPV client checks the Merkle Root against the block header.

3- SPV client checks that the block is at the correct depth.

-When wallets only request transactions to the addresses it owns, full nodes will become aware of the owner, we mitigate this by using bloom filters

4- We start with an empty array of m bits.

5- Hash the data to filter on(public key) using k hash functions to a value greater than 0 but less than m.

6- Set the bits in the array.

7- We repeat for any other filter criteria.

8- For a full node to relay a block, it uses k hash functions and checks for an intersection on the filter.

9- If there are no hits, the transaction is removed from the Merkle tree and is not forwarded to an SPV client, otherwise, if there is a hit, the pruned Merkle tree is forwarded to the client.

Note that there may be false positives meaning the SPV need to do some work to remove them. This also serves to hide the request from full nodes.

How do we know a Merkle Proof contains legitimate hashes?

1 Answers1