Merkle trees are important within the context of blockchain technology as they play a key role in ensuring these networks are both efficient and reliable. By understanding how a Merkle tree works we can gain a better understanding of how a blockchain functions.

A Merkle tree allows for the verification of large volumes of data while maintaining the integrity of the data without taking up large amounts of space on a network and minimal computing power.

Merkle Tree Functions

In simple terms, the beauty of the Merkle tree is its ability to take large amounts of input data and compress it into an output of fixed length. This output consists of just a single string of characters called a Merkle root.  The Merkle root can easily act as proof of the validity of the data that it represents. It can also easily be shared with others on the network. This means that copies of the entire database do not need to be maintained by each and every user. If an error occurs, we do not need to do a line-by-line search of the entire database.

However, before we can understand how a Merkle tree works, we need to understand the basics of hash functions. This is because hash functions are the key technology underlying a Merkle tree.

Hash Functions Explained

Notice what happens if only change one minor detail of the transaction and apply the same hash function. Our output is completely different.

The resulting hash value is unique to the input data and is like a fingerprint of the data.  The two output strings are very different despite having just a small change in the input data. Therefore, it is easy to determine when a file is corrupted.

Saving Space

Hashing also saves storage. Instead of storing huge amounts of input data, only the hash value needs to be stored on the network. With a distributed network, different sources can be used to store smaller subsets of the data and do away with the need for central storage. This contributes to the efficiency of a blockchain.

The Merkle Tree Concept

The key technology underlying a Merkle tree is the hash function, which serves the dual purpose of both encrypting the data and compressing it into a manageable size.  The basic structure of a Merkle tree is like that of a reverse tree as shown in the following diagram:

As with any tree, there are leaves, branches, and a root.  The leaves are the transactions (or blocks of transactions) that make up the input data.  A hash function is applied to each block of transactions (T1, T2, T3, T4) to create what are called leaf nodes (W, X, Y, Z).  The leaf nodes are the start of the Merkle tree.

In this example, we can concatenate (join together) two leaf nodes (W+X) to create a parent node(A) by applying the hash function again.  We can do the same with leaf nodes (Y+Z) to create a parent node (B). We can then hash parent nodes A+B together to get what is called the Merkle Root. The result is an output made up of a string of characters with a fixed length.  Now instead of storing a large set of data, we can represent that entire dataset with just the Merkle root.  

A More Complicated Tree

The Merkle tree depicted above is a simple example.  In reality, a Merkle tree can be much taller, having more levels, and much wider, incorporating more data sets. The process is the same. The hash function is applied repeatedly until we are left with just the Merkle root.

On a blockchain network, we can store the Merkle root in a secure place at a trusted source, and distribute the responsibility of storing the rest of the information at other sources on the network. Without revealing any actual data or requiring the contents of the entire database, we can verify subsets of the data very quickly just by matching each hash with the Merkle root. Any errors can be detected quickly by requesting the hash of a smaller subset until the corrupted block is found.  This increases efficiency by not requiring that an entire database be searched for just one error.