SVM capacity scaling
Unlocking Hardware Potential: Dynamic Thread Scaling in X1 Blockchain's Execution Scheduler
Last updated
Unlocking Hardware Potential: Dynamic Thread Scaling in X1 Blockchain's Execution Scheduler
Last updated
Once a user signs the transaction in their wallet, the wallet sends the transaction to a X1 RPC server. RPC servers can be run by any validator. Upon receiving the transaction, the RPC server checks the leader schedule (determined once per epoch, about 2 days long) and forwards the transaction to the current leader as well as the next two leaders. The leader is in charge of producing a block for the current slot, and is assigned four consecutive slots. Slots usually last around 400 milliseconds.
Once the signed transaction reaches the current leader, the leader validates the transaction's signature and performs other pre-processing steps before scheduling the transaction for execution.
Whereas the EVM is a "single-threaded" runtime environment, meaning it can only process one contract at a time, the SVM is multi-threaded and can process more transactions in significantly less time. Each thread contains a queue of transactions waiting to be executed, with transactions randomly assigned to a queue.
The default scheduler implementation is multi-threaded, with each thread maintaining a queue of transactions waiting for execution. Transactions are randomly assigned to a single thread’s queue. Each queue is ordered by priority fee (denominated in fee paid per compute unit requested) and time.
Note that there is no global ordering of transactions queued for execution; there is just a local ordering in each thread’s queue.
Many blockchain networks construct entire blocks before broadcasting them, known as discrete block building. X1 and Solana, in contrast, employs continuous block building which involves assembling and streaming blocks dynamically as they are created during an allocated time slot, significantly reducing latency.
Each slot lasts 400 milliseconds, and each leader is assigned four consecutive slots (1.6 seconds) before rotation to the next leader. For a block to gain acceptance, all transactions within it must be valid and reproducible by other nodes.
Two slots before assuming leadership, a validator halts transaction forwarding to prepare for its upcoming workload. During this interval, inbound traffic spikes, reaching over a gigabyte per second as the entire network directs packets to the incoming leader.
Upon receipt, transaction messages enter the Transaction Processing Unit (TPU), the validator's core logic responsible for block production. Here, the transaction processing sequence begins with the Fetch Stage, where transactions are received via QUIC. Subsequently, transactions progress to the SigVerify Stage, undergoing rigorous validation checks. Here the validator verifies the validity of signatures, checks for the correct number of signatures, and eliminates duplicate transactions.
The banking stage can be described as the block-building stage. It is the most important stage of the TPU, which gets its name from the “bank“. A bank is just the state at a given block. For every block, X1 has a bank that is used to access state at that block. When a block becomes finalized after enough validators vote on it, they will flush account updates from the bank to disk, making them permanent. The final state of the chain is the result of all confirmed transactions. This state can always be recreated from the blockchain history deterministically.
Transactions are processed in parallel and packaged into ledger “entries,” which are batches of 64 non-conflicting transactions. Parallel transaction processing on X1 is made easy because each transaction must include a complete list of all the accounts it will read and write to. This design choice places a burden on developers but allows the validator to avoid race conditions by easily selecting only non-conflicting transactions for execution within each entry. Transactions conflict if they both attempt to write to the same account (two writes) or if one attempts to read from and the other writes to the same account (read + write). Thus conflicting transactions go into different entries and are executed sequentially, while non-conflicting transactions are executed in parallel.
In the above diagram, each box represents a single transaction. Each transaction is labeled with the accounts it locks. Execution thread 1 locks accounts [a,b,c], [d], fails to lock [c,j], and [f,g]. Execution thread 2 locks accounts [w], [x,y,z], fails to lock [c], and [v]. The remaining transactions are re-scheduled for future execution.
This is one way X1 and Solana achieves higher performance than competing chains. When multiple transactions don’t need to touch the same state, they can be executed in parallel which improves the throughput of the chain. However, this imposes a cost on developers as any piece of state that may be required by a transaction must be specified up front.
There are six threads processing transactions in parallel, with four dedicated to normal transactions and two exclusively handling vote transactions which are integral to X1 and Solana’s consensus mechanisms. All parallelization of processing is achieved through multiple CPU cores; validators have no GPU requirements.
Once transactions have been grouped into entries, they are ready to be executed by the Solana Virtual Machine (SVM). The accounts necessary for the transaction are locked; checks are run to confirm the transaction is recent but hasn’t already been processed. The accounts are loaded, and the transaction logic is executed, updating the account states. A hash of the entry will be sent to the Proof of History service to be recorded (more on this in the next section). If the recording process is successful, all changes will be committed to the bank, and the locks on each account placed in the first step are lifted.
Solana's high throughput and rapid transaction processing capabilities are largely attributed to its parallel processing architecture. However, a significant limitation in this architecture is the fixed number of banking threads allocated for scheduling transaction execution. Currently, Solana limits the number of banking threads to just four, irrespective of the underlying hardware's capabilities. This constraint results in underutilization of modern multi-core processors, which are increasingly common in node environments.
Banking threads in Solana are responsible for executing transactions, managing state changes, and processing smart contracts. While the parallelism in Solana's architecture theoretically supports high throughput, the artificial limitation of four banking threads leads to several inefficiencies:
1. Underutilization of Multi-Core Processors: Contemporary processors frequently feature 16, 32, or more CPU cores. The restriction to four banking threads on such hardware fails to harness the full computational potential, resulting in significant idle processing capacity.
2. Execution Bottlenecks: The limited number of threads introduces a bottleneck in the transaction processing pipeline, constraining the network's ability to handle peak transaction loads. This results in increased latency and reduced throughput.
3. Suboptimal Parallelism: The effectiveness of Solana's parallel processing is curtailed by the thread limitation. As a result, the full benefits of concurrent transaction processing are not realized, diminishing the overall efficiency of the network.
To address these limitations, X1 Blockchain introduces a dynamically scaling execution scheduler. This innovation allows the number of banking threads to scale in accordance with the CPU core count available on the node, optimizing the utilization of modern hardware.
Adaptive Thread Allocation: X1 Blockchain’s execution scheduler dynamically adjusts the number of banking threads based on the detected CPU core count of the node. For instance, on a node with a 32-core processor, the scheduler could allocate up to 32 banking threads, significantly enhancing transaction processing capacity.
Enhanced Parallelism: By leveraging additional threads, X1 Blockchain maximizes parallel transaction processing. This approach minimizes the bottlenecks associated with a limited thread count, leading to a more efficient execution pipeline.
Increased Throughput: The ability to process more transactions concurrently directly correlates with improved network throughput. As more threads are utilized, transaction confirmation times are reduced, particularly under high network load conditions.
Scalability and Future-Proofing: X1 Blockchain's dynamic thread scaling mechanism is designed to scale with advancements in hardware technology. As multi-core processors become more powerful and prevalent, the blockchain can seamlessly scale its execution capabilities, ensuring long-term viability and performance.
By aligning thread allocation with available hardware resources, X1 Blockchain aims to eliminate the bottlenecks that hinder transaction throughput in current blockchain architectures. This approach enhances network efficiency and future-proofs the blockchain against the limitations of present-day systems, ensuring that it can scale effectively as hardware technology evolves.