Difference between revisions of "Hashed Trie"

From Snowblossom Wiki
Jump to: navigation, search
(Created page with "==Hashed Trie== ===Definition=== ===Advantages=== ===Disadvantages=== ===Why it is awesome for blockchains===")
 
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Hashed Trie==
+
=Hashed Trie=
  
===Definition===
+
I think this is novel, but if I am wrong, please let me know.
  
===Advantages===
+
==Implementation==
  
===Disadvantages===
+
* Source [https://github.com/snowblossomcoin/snowblossom/tree/master/lib/src/trie Snowblossom Trie]
 +
* Tests [https://github.com/snowblossomcoin/snowblossom/tree/master/lib/test/trie Snowblossom Trie Tests]
  
===Why it is awesome for blockchains===
+
 
 +
 
 +
 
 +
==Definition==
 +
 
 +
A Hashed Trie is a structure that is a [https://en.wikipedia.org/wiki/Trie Trie] where each node has a hash value based on its contents.  In my implementation there is an underlying map of hashs to nodes: Map<Hash,Node>.  The tree state is saved as a hash, which simply points to the root node for that state.  The children of each node are referenced by their hash.  No node is every overwritten (except as possibly rewritten as the exact same data), if it is changed the hash changes and it is saved under the new hash.  This makes it a space efficient Copy-on-Write (CoW) structure.
 +
 
 +
Each operation is given the root hash as a parameter.  Any modify operation returns a new root hash for the new tree.
 +
 
 +
The root hash represents the contents of the tree.  Since there are fixed rules for the structure of the tree, the same keys and values will always result in the same root hash.
 +
 
 +
==Advantages==
 +
 
 +
* Space efficient.  Even though we end up storing many variations of the same tree, we only duplicate the nodes that are different.
 +
* Previous versions of the tree are readable as long as you have the previous root hash.
 +
* Any mutation can be done from any previous root node.  No need to ever back out any changes, just use a previous root hash.
 +
* No need to lock for writes.  Reads and writes can happen simultaneously.
 +
 
 +
==Disadvantages==
 +
 
 +
* Any write will involve rewriting all the nodes from the root on down to the leaf node in question log(n) operations.  Can be helped by batching updates.
 +
* Reads involve reading all the nodes along the path, log(n) operations (likely helped greatly with cache).
 +
* Since each mutation involves taking the old root hash and returning a new root hash, if there are two writes that you want in the tree you need to do them in a batch or in sequence.
 +
* All previous nodes are always kept.  There is no pruning.
 +
 
 +
==Why it is awesome for blockchains==
 +
 
 +
* The root hash can be shared in consensus to insure that nodes have the same data in the tree.  For example, in Snowblossom the UXTO root hash is from the UTXO hashed trie.
 +
** Then the shared root hash and the intermediate nodes can be used to prove the existence and completeness of any data in the tree to light clients or header only nodes.
 +
** As the structure has fixed rules, the proof can also prove that data isn't there.  Example: by showing the parent node of where the data would be, if it existed.
 +
* If there is a reorg, or potential reorg the new root hashes can be updated based on the root hashes of the previous blocks.  No need to back out changes or pick a chain, just apply all reasonable blocks to the hashed trie.  Use whatever ends up winning.  For example, lets say the trie is storing the transactions that have been confirmed as of the most recent block.  Lets say we are on block 10000.  Suppose there is a reorg so a new block 9997 comes in.  We build the transaction trie for that simply taking the root hash from block 9996 and mutating from there.  Then the new chain fork can be imported independently of the existing fork.
 +
* If we save the root hash for previous blocks, we can query the tree for the state of things at any previous block we want.  This way, a client that is doing a long read of a bunch of stuff can pick and block and query everything relative to that even as new blocks are coming in.
 +
* Since for a blockchain ledger, we generally don't want to throw data away, the fact that nothing is ever pruned is fine.

Latest revision as of 17:40, 4 September 2019

Hashed Trie

I think this is novel, but if I am wrong, please let me know.

Implementation



Definition

A Hashed Trie is a structure that is a Trie where each node has a hash value based on its contents. In my implementation there is an underlying map of hashs to nodes: Map<Hash,Node>. The tree state is saved as a hash, which simply points to the root node for that state. The children of each node are referenced by their hash. No node is every overwritten (except as possibly rewritten as the exact same data), if it is changed the hash changes and it is saved under the new hash. This makes it a space efficient Copy-on-Write (CoW) structure.

Each operation is given the root hash as a parameter. Any modify operation returns a new root hash for the new tree.

The root hash represents the contents of the tree. Since there are fixed rules for the structure of the tree, the same keys and values will always result in the same root hash.

Advantages

  • Space efficient. Even though we end up storing many variations of the same tree, we only duplicate the nodes that are different.
  • Previous versions of the tree are readable as long as you have the previous root hash.
  • Any mutation can be done from any previous root node. No need to ever back out any changes, just use a previous root hash.
  • No need to lock for writes. Reads and writes can happen simultaneously.

Disadvantages

  • Any write will involve rewriting all the nodes from the root on down to the leaf node in question log(n) operations. Can be helped by batching updates.
  • Reads involve reading all the nodes along the path, log(n) operations (likely helped greatly with cache).
  • Since each mutation involves taking the old root hash and returning a new root hash, if there are two writes that you want in the tree you need to do them in a batch or in sequence.
  • All previous nodes are always kept. There is no pruning.

Why it is awesome for blockchains

  • The root hash can be shared in consensus to insure that nodes have the same data in the tree. For example, in Snowblossom the UXTO root hash is from the UTXO hashed trie.
    • Then the shared root hash and the intermediate nodes can be used to prove the existence and completeness of any data in the tree to light clients or header only nodes.
    • As the structure has fixed rules, the proof can also prove that data isn't there. Example: by showing the parent node of where the data would be, if it existed.
  • If there is a reorg, or potential reorg the new root hashes can be updated based on the root hashes of the previous blocks. No need to back out changes or pick a chain, just apply all reasonable blocks to the hashed trie. Use whatever ends up winning. For example, lets say the trie is storing the transactions that have been confirmed as of the most recent block. Lets say we are on block 10000. Suppose there is a reorg so a new block 9997 comes in. We build the transaction trie for that simply taking the root hash from block 9996 and mutating from there. Then the new chain fork can be imported independently of the existing fork.
  • If we save the root hash for previous blocks, we can query the tree for the state of things at any previous block we want. This way, a client that is doing a long read of a bunch of stuff can pick and block and query everything relative to that even as new blocks are coming in.
  • Since for a blockchain ledger, we generally don't want to throw data away, the fact that nothing is ever pruned is fine.