How DVT Helps: Staking-as-a-Service Providers
This is part 4 of a series exploring how DVs help different types of validators in the Ethereum ecosystem. In this article, we visit Staking-as-a-service (StaaS) providers.
Dissecting the role of staking-as-a-service providers
While Ethereum is a permissionless and decentralised blockchain that allows anyone to participate in securing the network, operating staking infrastructure is non-trivial. Recognizing the complexity involved, a distinct niche has emerged in the staking ecosystem, paving the way for Staking-as-a-Service (StaaS) providers. These entities cater to the ETH staking market by offering an option for ETH holders seeking to stake without the intricacies of operating nodes.
Today, the majority of staked ether is staked through professional node operators, with >66% of staked ETH with the top 50 node operators. The largest staking entities are centralised exchanges Coinbase and Binance, with 14.9% and 4.5% of Ethereum staked, respectively, and many of the top 50 include the 37 node operators who stake the ETH in Lido’s pool, each staking around 1% of all staked ETH.
While this has created concern in the community about Ethereum’s decentralisation, one thing is for sure: in order to maintain the performance and resilience of the entire Ethereum blockchain, it is important that these professional node operators are able to effectively tackle the challenge of running hundreds, thousands, or even tens of thousands of validators. These node operators approach the staking challenge differently than solo stakers, and have very different infrastructure setups as a result.
A StaaS provider responsible for thousands of validators will typically run them from multiple nodes, spread across several data centres, with a healthy mix of execution and consensus clients. As an example, a StaaS provider running 10,000 validators may use 10 machines with 1,000 validators running on each machine. If a single machine goes offline, 1,000 validators go offline. Running backup systems creates a significant risk of slashing, which occurs if the same validator key is run from two nodes at the same time, for example in the event that the primary machine comes back online unexpectedly. Therefore, active-active setups are virtually impossible to safely run in Ethereum.
How DVT helps StaaS providers
Increased fault tolerance
Obol’s Distributed Validator Technology (DVT) is a technology primitive that allows validators to be run by a cluster of nodes. As long as two-thirds of the nodes are active, the cluster as a whole remains active. (This could be 3-of-4, 5-of-7, or 7-of-10 for example) This builds in a level of fault tolerance — let’s imagine our example node operator uses their 10 machines to create a DVT cluster. Now, as long as 7 out of 10 of the machines remain online, the entire cluster stays online, and all 10,000 validators remain active. As before, the node operator can run the machines in different data centres and with different client combinations to increase the robustness of the entire cluster, eliminating the possibility of a validator (or group of validators) being affected by an issue with a single client or machine.
Reduced slashing risk
The use of DVT represents the first time that node operators can safely run active-active setups for Ethereum validation, without raising their risk of slashing. This is because each node only runs one key shard of the validator private key, which never exists entirely in one location at any point in time. Together, signatures from two-thirds of the private key shards can complete their duties, similar to a multi-sig. Importantly, not all private keys are needed to construct a valid signature for the associated public key. Downtime of a single node can be dealt with by simply moving the key shard to a new node, or a slow investigation to solve the issue with the original node. (the entire cluster and validators remain online as long as the majority of nodes are online.) If it happens that the same key shard is run on two nodes at once, there is no slashing risk.
In a traditional, non-DVT setup, a node operator with an active-passive setup can move an entire validator key from one node to another, but must be absolutely sure that the validator key has been deleted from the original node. Once a validator key is uploaded to the secondary machine, all it takes is the primary machine to come back online and begin validating with the same key for a slashing event to occur. Some solutions do exist, such as remote key signers Dirk and Web3Signer, as well as doppelganger protection built into some clients, but this hasn’t prevented some major 4 out of the 5 largest slashing events on Ethereum occurring to enterprise node operators.
A DV setup is also more resilient to attacks, as described in the “threat model” and “risks FAQ” pages of the documentation. The loss or theft of a minority of the key shards, while not ideal, does not pose the same immediate threat as the theft of an entire validator key would. (A malicious actor would need to gain access to the two-thirds majority of key shards in order to be able to slash or exit the validator.) This perhaps could have prevented the November 2023 issue where a Lido node operator had to shut down their entire set of 9,003 validators after a security incident as a precaution, in order to generate new keys.
Reduced cost of hardware and devops staffing
Significant cost reductions may be possible for node operators running validators across a large number of nodes. Due to the increased fault-tolerance and reduced slashing risk described above, a node operator may determine that a 7-machine DV cluster, or even a 4-machine DV cluster, is safer than a current 10-machine non-DV setup. (In the 7 machine cluster, up to two nodes can go offline without affecting the cluster, and in the 4-machine cluster, one node can go offline) Depending on how cost-conscious the node operator is, a large reduction in the number of required nodes could create significant cost savings.
Integrating Obol’s DVT into an existing validator client stack can be done as-is, alongside existing clients. This is due to the design of Obol’s Charon DVT client as a middleware, which sits in between the validator client and consensus client, and intercepts the messages between them. To migrate existing validators to a DVT cluster, it is possible to split the validator key to distribute the validator across multiple nodes.
Running a DV cluster with failsafes also means that a problem with a single node no longer requires immediate attention from devops engineers, who can take their time to troubleshoot issues. This significantly reduces staffing requirements and stress for devops engineers, and makes hurried mistakes much less likely.
“One of the key challenges of running ETH validators is the need to reduce single points of failure and maintain uptime. DVT significantly reduces these risks, and also makes key management safer, since there is no sole ownership of the private key. With less downtime penalties and easier risk management, we enjoy cost savings and can focus on improving the experience and staking rewards for our users.“ - Calvin Zhou, Head of Staking at RockX
Increased performance
Aside from the boost to robustness and uptime that a DVT validator receives, validators are able to enjoy above-average effectiveness as well. Testing is being carried out by many node operators, on both testnet and mainnet, and the results are very promising.
As part of our Alpha phase, for example, Stakely is running a mainnet validator on a solo cluster. Their cluster each consisted of 4 nodes on bare metal machines distributed across Europe, and they were able to achieve 99.9% uptime, with 97.3% effectiveness as per Rated Network, and a 1.02 second average inclusion delay. A similar cluster from ClayStack achieved similar results: 99.9% uptime, 97.8% effectiveness, and a 1.02 second average inclusion delay.
There is some evidence to suggest that DVT may introduce some additional latency for clusters with nodes spread across different regions of the globe, but for clusters with well-connected nodes, running well-spec'd hardware, DVT clusters are able to achieve performance levels on par (or better than) their non-DVT counterparts.
As node operators become more comfortable with DVT, and build a track record over time, they can more confidently issue uptime SLAs and performance guarantees to their customers. Additionally, we should see improvements in the overall performance of liquid staking pools using DVT, and more generally improvements in the underlying resilience of the Ethereum blockchain.
Improved access to staking insurance
Node operators who are early to adopt DVT may gain a competitive advantage by acquiring insurance at a lower rate than their competitors, who are not using DVT: In today’s staking industry, insurance is still expensive and limited in the size of its coverage. (Notably, this caused Lido to choose to do away with insurance in mid-2022, instead using their own Treasury to compensate stakers for losses.)
Over time, the use of DVT by more StaaS providers will result in reduced downtime and slashing risk, which will increase the confidence of insurers. For StaaS providers who use DVT, reducing their slashing risk and downtime, insurers will charge lower premiums against these events and increase the size of coverage in policies. Over time, as DVT builds a more significant track record, the reduced frequency of downtime, slashing events, and associated claims against insurance companies should provide enough data to increase confidence in DVT and further reduce premiums.
Squad staking allows providers to create multi-org clusters
“Dream team” clusters can offer maximum risk diversification to large clients
DVT is well-suited to multi-org “squad staking” setups, whereby multiple StaaS providers can come together to run a cluster together. For example, this could be a 4-node cluster where two nodes are run by one provider, and two by the other. Another example could be a 7-node cluster, where each node is run by a different provider.
A multi-org “dream team” cluster, made up of high quality staking providers, is ideal to provide services to large entities like centralised exchanges or large staking pools, who already stake ETH across multiple StaaS providers in order to reduce risk.
For small providers, squad staking and “Techne” provide an avenue for growth
DVT enables a collaborative way for node operators to work together. Especially for new node operators with the skills to run validators, but lacking the experience and reputation, taking part in “squad staking” DVT clusters alongside established operators or as part of staking pools can be a great way to build experience and showcase competence in a lower-risk environment. Lido’s new “Simple DVT” module, for example, serves as a way to onboard operators from the community, while Ether.fi’s “Operation Solo Staker” provides a similar opportunity for home stakers to run as part of DVT clusters. Stakewise V3 likewise has enabled DVT to be employed in “vaults” (pools) of staked ETH run by multiple node operators, opening the possibility for collaboration between operators.
As previously announced, Obol will begin to issue non-transferable “Techne” Credential NFTs to node operators who complete a series of DV-related trainings and tasks. Thus, by holding a Techne credential, operators can prove their experience level and competence to staking pools and other collaborators.
Sign up to get notified of our next blog pieces! 👇