Blockchain Security Mechanisms

Cyptographic Hashing & Merkle Trees keep the integrity of the data on public and private blockchains intact.

Cryptographic Hashing

Hashing functions are an essential part of cybersecurity and of several major cryptocurrency protocols such as Bitcoin.

What is hashing?

Hashing is a method of cryptography that converts any form of data into a unique string of text. Any piece of data can be hashed, no matter its size or type. In traditional hashing, regardless of the data’s size, type, or length, the hash that any data produces is always the same length. A hash is designed to act as a one-way function — you can put data into a hashing algorithm and get a unique string, but if you come upon a new hash, you cannot decipher the input data it represents. A unique piece of data will always produce the same hash.

How does it work?

Hashing is a mathematical operation that is easy to perform, but extremely difficult to reverse. (The difference between hashing and encryption is that encryption can be reversed, or decrypted, using a specific key.) The most widely used hashing functions are MD5, SHA1 and SHA-256. Some hashing processes are significantly harder to crack than others. For example, SHA1 is easier to crack than bcrypt.

Some examples of data run through SHA1 hashes. The SHA1 hashes will always be the same for this data.

Who uses hashing?

The average user encounters hashing daily in the context of passwords. For example, when you create an email address and password, your email provider likely does not save your password. Rather, the provider runs the password through a hashing algorithm and saves the hash of your password. Every time you attempt to sign in to your email, the email provider hashes the password you enter and compares this hash to the hash it has saved. Only when the two hashes match are you authorized to access your email.

Hashing in Cryptocurrencies

In the Bitcoin blockchain, ‘mining’ is essentially conducted by running a series of SHA-256 hashing functions. In cryptocurrency blockchains today, hashing is used to write new transactions, timestamp them, and ultimately to add a reference to them in the previous block. When a block of transactions is added to the blockchain, and consensus is reached among operators of different nodes (validating that all of them have the right and true version of the entire ledger), it is nearly impossible to reverse a transaction due to the enormous computing power that would be required by anyone attempting to tamper with the blockchain, and the one-way nature of the hashing. Hashing is therefore crucial to maintain the cryptographic integrity of the blockchain.

Hashing and Cybersecurity

When an organization discovers that a platform’s passwords have been compromised, it usually means that hackers have acquired the hashes that represent the passwords. Hackers then run the hashes of common words and combinations of common words and numbers to decipher some of the passwords that users have saved. The cybersecurity industry now uses a mechanism called ‘salting’. Salting includes adding random data to a password before hashing it, and then storing that ‘salt value’ with the hash. This process makes it harder for hackers to use pre-computation techniques and crack passwords of hashed data that they have acquired.

The Merkle Root is derived from hashing each transaction pair in a block until it is down to a single hash.

Merkle Trees

Merkle trees are a fundamental part of blockchain technology. A merkle tree is a structure that allows for efficient and secure verification of content in a large body of data. This structure helps verify the consistency and content of the data. Merkle trees are used by both Bitcoin and Ethereum.

How do Merkle trees work?

A Merkle tree summarizes all the transactions in a block by producing a digital fingerprint of the entire set of transactions, thereby enabling a user to verify whether or not a transaction is included in a block.

Merkle trees are created by repeatedly hashing pairs of nodes until there is only one hash left (this hash is called the Root Hash, or the Merkle Root). They are constructed from the bottom up, from hashes of individual transactions (known as Transaction IDs).

Each leaf node is a hash of transactional data, and each non-leaf node is a hash of its previous hashes. Merkle trees are binary and therefore require an even number of leaf nodes. If the number of transactions is odd, the last hash will be duplicated once to create an even number of leaf nodes.

The Merkle Tree of transactions A, B, C & D.

Let’s look at an example of four transactions in a block: A, B, C, and D. Each of these is hashed, and the hash stored in each leaf node, resulting in Hash A, B, C, and D. Consecutive pairs of leaf nodes are then summarized in a parent node by hashing Hash A and Hash B, resulting in Hash AB, and separately hashing Hash C and Hash D, resulting in Hash CD. The two hashes (Hash AB and Hash CD) are then hashed again to produce the Root Hash (the Merkle Root).

This process can be conducted on larger data sets, too: consecutive blocks can be hashed until there is only one node at the top. Hashing is usually conducted using the SHA-2 cryptographic hash function, though other functions can also be used.

The Merkle Root summarizes all of the data in the related transactions, and is stored in the block header. It maintains the integrity of the data. If a single detail in any of the transactions or the order of the transactions changes, so does the Merkle Root. Using a Merkle tree allows for a quick and simple test of whether a specific transaction is included in the set or not.

The entire dataset doesn’t need to be downloaded to verify the integrity of Transaction 5.

A Merkle tree differs from a hash-list in that with a Merkle tree, one branch can be downloaded at a time and the integrity of each branch can be immediately verified, even if the rest of the tree is not yet available. This is advantageous because files can be split up into very small data blocks, such that only small blocks need to be downloaded again if the original version is damaged.

Uses

Using a Merkle tree can significantly reduce the amount of data that a trusted authority has to maintain for verification purposes. It separates the validation of the data from the data itself. A Merkle tree can reside locally, or on a distributed system.

Merkle trees have three major benefits:

1. They provide a means to prove the integrity and validity of data

2. They require little memory or disk space as the proofs are computationally easy and fast

3. Their proofs and management only require tiny amounts of information to be transmitted across networks

The ability to prove that a log is complete and consistent is essential to blockchain technology and the general ledger concept. Merkle trees help verify that later versions of a log include everything from an earlier version and that all data is recorded and presented in chronological order. Proving that a log is consistent requires showing that no previous records have been added, altered or tampered with, and that the log has never been branched or forked.

Merkle trees benefit miners and users on the blockchain. A miner can calculate hashes progressively, as the miner receives transactions from peers. A user can verify parts of blocks individually and can check individual transactions using hashes of other branches of the tree.

Simplified Payment Verification (SPV)

Simplified Payment Verification (SPV) is a method of verifying if particular transactions are included in a block without downloading the entire block. Merkle trees are used extensively by SPV nodes.

SPV nodes do not have data from all transactions in a block. They only download block headers. Merkle trees enable SPV nodes on the blockchain to check if miners have verified the transactions in a block without downloading all the transactions in a block. This method is currently used by some lightweight Bitcoin clients.

Ethereum

Ethereum uses three different Merkle Roots in each block:

1. The first root is of the transactions in the block

2. The second root represents the state

3. The third root is for transaction receipts

Ethereum uses a special type of hash tree called the ‘Merkle Patricia Tree’

Indispensable Tools on the Blockchain

Innovations such as permissioned blockchains will further increase data security and accessibility. In permissioned blockchains the participants of the network have the ability to restrict who can participate in the consensus mechanism of the blockchain. Additionally, permissioned blockchains allow the users to assign permissions on who can access the data of their digital identity.

Cryptographic hashing has long played a role in cybersecurity, and is now poised to power the coming wave of blockchain applications. Merkle trees are powerful and indispensable tools for miners and users on the blockchain. They are extremely powerful and are at the heart of several peer-to-peer networks such as BitTorrent, Git, Bitcoin, and Ethereum.

Blockchain Consulting | Smart Contract Development | Artificial Intelligence

Shaan Ray (@ShaanRay) | Twitter

Resources:

Online hash generators. You can run an MD5, SHA1, SHA-256 and other hashing functions on your data here:

The Merkle Patricia Tree: https://github.com/ethereum/wiki/wiki/Patricia-Tree

Blockchain Security Mechanisms was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Publication date

02/27/2018 - 01:00

Author

Shaan Ray

Article source

Disclaimer

The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.