Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
DPoS and dBFT based blockchains have gained tremendous popularity in the past year. Compared with PoW blockchains, DPoS and dBFT alternatives have advantages such as high performance, energy efficiency, fast finality, and almost never hard forking.
However, a notable and much less understood trade-off is the occasional service disruption and unavailability that could occur with those blockchains. Extended service disruptions have been recorded for EOS, Steemit, Stellar, and others. And it is just a matter of time before all DPoS / dBFT blockchain experience such issues.
In this article, we will use a postmortem analysis of a recent service disruption of the CyberMiles blockchain as an example to illustrate the problem and present potential solutions.
On May 26th 2019, the CyberMiles public blockchain suffered a technical failure that paused blockchain operations for 17 hours. During that time, no transaction was accepted or recorded on the blockchain. There was no data loss nor transaction rollback.
As a decentralized public blockchain, CyberMiles is maintained by 24 validators across the globe. Any software change must be approved and then implemented by 2â3 of validators in order to take effect. With a diverse group of independent validators, the CyberMiles blockchain downtime, while regrettable, serves the purpose of maintaining security and integrity of transactions through consensus.
Here is what happened.
At around 5am EST (US Eastern Time) on May 26th 2019, a user submitted a transaction to create a smart contract on the CyberMiles blockchain. The transaction was included in block height 1724748.
When the transaction is executed by the CyberMiles Virtual Machine (CVM) on each blockchain node, it triggered a bug that crashed the CVM.
Specifically, when the user requested to execute a transaction without paying the gas fee, the CVMâs âfree gasâ rules check for the transactionâs target address. However, in the case of a smart contract creation transaction, the target address is null.
That causes the software to throw an infamous null pointer exception, crashing the CVM.
All CyberMiles nodes stopped at block height 1724747, unable to process block height 1724748. The problem was reported and escalated to CyberMiles core developers.
The CVM developers based in Taipei immediately worked on the problem and produced a software patch.
The blockchain engineering team in Beijing and USA then tested it and released a new software version v0.1.8-beta-hotfix. The core developers were ready to notify validators to review, approve, and implement v0.1.8-beta-hotfix.
However, that was when a second problem surfaced. Since the null pointer exception crashed the CVM, the virtual machine shuts down abnormally.
The operations team based in Beijing, Los Angeles, and Austin discovered that some nodes could have corrupted database files after this crash. So, a straightforward restart with the new binary application is now out of the question.
Out of abundant precaution, the engineering team decided to recommend all nodes to restart from an uncorrupted snapshot at block height 1724748 following CyberMilesâ snapshot fast sync procedure.
An uncorrupted node with all validator signatures on block height 1724748 is found in the community. It has a data size of 18GB after compression. The data snapshot from this node is then reviewed, verified, and validated, before it is made available to the community again.
At 5pm EST, the first validator nodes in North America came online with the update software v0.1.8-beta-hotfix. However, without the quorum of 17 active validators (that is 2â3 of 24 at block height 1724748), the blockchain cannot produce the next block. The quorum is reached at 10pm EST when Asian validators woke up on Monday morning. The blockchain services, such as CMT Wallet, CMT Cube, and CMT Tracking, are fully restored soon after.
Key lessons
- The CyberMiles development team must increase software testing coverage.
- Incident response coordination between core developers and validator operators could be improved. In this case, the root cause was identified and patched quickly, but coordination across validators took a very long time.
- A diverse group of validators improve blockchain security but could prolong service disruption by making it difficult to reach a quorum.
Postmortem: CyberMiles Blockchain Service Disruption Incident was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.