This post was originally a talk delivered at Common Knowledge Conference (CKCON) 2024 in Chiang Mai, Thailand. Yukang is an open-source enthusiast and a versatile technologist. Now, he is contributing to the CKB core team and Fiber team. Explore more about his work on his personal site: CatCoding.
As CKB continues to stabilize, we are focusing on making the codebase easier to maintain while ensuring that CKB performs well across a wider range of scenarios. To achieve this, we made three key updates to the CKB core:
Refactored the transaction pool using
MultiIndexMap
.Implemented the Replace-by-Fee (RBF) feature.
Designed a new approach for Script verification.
Refactoring Transaction Pool with Multi-Index-Map
What exactly is a transaction pool? Transaction pools often don't get much attention, there isn't much documentation about them online. A transaction pool is a cache that stores transactions received from either the local node or network peers. When the chain reorganization occurs, transactions may be resubmitted to the pool.
The transaction pool stores numerous transactions, some of which have parent-child relationships with each other. There are two critical operations for managing the transaction pool:
Packaging. This determines how transactions are selected for inclusion in blocks. We consider several factors: the fee, the weight, the ancestors_fee, and the ancestors_weight.
Eviction limits the transaction pool size by setting a maximum capacity. When the pool reaches this limit, certain transactions are selected for removal based on the EvicKey.
CKB's transaction pool uses a unique two-step confirmation design. This helps solve propagation efficiency issues. When a transaction is first submitted to the pool, it enters the pending
status, then transitions to gap
and finally to proposed
status.
In the initial implementation of the transaction pool, we had three different data structures for storing transactions. Two of these were PendingQueue
structures that stored transactions with pending and gap status. The third was the ProposedPool
.
A transaction moves through these stages sequentially—it gets pushed into the first queue, then popped from it and pushed into the next one. The image below describes the basic flow of the old design.
The old implementation had several key problems:
Implementing a sorting strategy for eviction was cumbersome, requiring separate sorting of multiple transaction sets before merging them.
The
PendingQueue
lacked tracking of cell_dep and parent-child relationships, which are essential for Replace-By-Fee functionality.The three data structures contained redundant code, with functions duplicated across push and pop operatiorens, making the codebase unnecessarily complex.
The solution is to refactor it using a data structure called MultiIndexMap. This Rust package essentially combines HashMap and BTreeMap to store indexes, allowing us to efficiently retrieve specific properties while iterating over them. It is a perfect fit for our needs.
We’ve refactored the transaction pool using MultiIndexMap
. The pool map now contains an entry, which is itself a MultiIndexMap
. This map includes properties, such as status
, to indicate the current state of a transaction. Additionally, there are two sorting keys in the MultiIndexMap
to manage the transaction order.
After the refactoring, we now have cleaner code with no performance regression, and we’ve implemented a more accurate strategy for evicting transactions. At the same time, we've contributed improvements to the upstream transaction pool, including new operations and performance enhancements. This ensures that other projects can benefit from our contributions. This is one of the great advantages of open source.
With the refactoring complete, we’re now ready to implement new features, such as Replace-by-Fee, as the transaction removal process is now much more efficient.
Achieve Faster Transaction Packaging with Replace-by-Fee
Why Do We Need Replace-by-Fee (RBF)?
Before RBF, if you wanted an existing transaction to be included in the next block more quickly, your only option was to create a new transaction. The new transaction would consume some outputs from the older transaction, and the child transaction would pay a higher fee. This is because the transaction pool's algorithm tries to prioritize transactions based on their fee rate. If the child transaction pays a higher fee, the parent transaction has a higher likelihood of being included in the next block. This is known as "child pays for parent."
However, this approach doesn't cover all scenarios. We need a strategy that allows us to increase the fee for an existing transaction already in the pool. In fact, we can't modify the fee directly in the current transaction pool. Instead, we need to replace the older transaction with a new one that reflects the updated fee.
Above is the conflicted transactions. In the left part of the diagram, Tx 1 will be replaced because Tx 2 consumes the same input. If we allow the two transactions to be submitted into transaction pool, there will be double-spending. So we need to remove Tx 1.
And the left part is a more complex scenario. Tx 1, Tx 2, and Tx 3 will be all removed because there are some conflicts with the new one.
This is our workflow for RBF. During the commit stage, we need to identify all transactions that conflict with the new one. We then check each RBF rule in sequence. If all the rules are satisfied, we proceed to remove the old transaction and submit the new one. While this workflow may seem straightforward, it’s not without its complexities, as RBF can introduce potential security risks.
RBF has been released for almost one year, and it is enabled by default. All running CKB node supports this feature. The usage is very simple. If you want a transaction to be packed more quickly, you can just resubmit the old transaction with higher fee. And here is a more detailed documentation on this part you can check out.
Enhanced Script Verification Flow With CKB-VM Pause
The last topic I want to talk about is Script verification. When a new transaction is submitted, it must be verified. The verification will start up CKB-VM to run all the Scripts in the transaction. If any of the Scripts failed, the transaction will be rejected; if passed, the transaction will be submitted to the transaction pool.
Large-cycle transactions introduce several challenges. Verification can take a significant amount of time—sometimes 10 or even 20 seconds. However, we don’t want the verification process (in CKB-VM) to block other components of the CKB node by monopolizing the CPU.
For example, when processing a new block, we want to suspend CKB-VM and resume it once block processing is complete. Unfortunately, CKB-VM doesn't natively support this "suspend and resume" scenario. As a workaround, we set a maximum cycle limit when starting CKB-VM, causing it to stop once the specified cycle count is reached. This allows us to dump the state as a snapshot. When we need to resume verification, we load the snapshot and continue from where the process stopped.
The implementation is somewhat buggy, as we have to store the internal state of CKB-VM along with the cycle count, which adds complexity.
The above describes the old design for verifying transactions with large cycles. When a new transaction is submitted, we first attempt to run up to 10 million cycles. If the cycle count exceeds this limit, the transaction is pushed to a Chunk Queue. A background job will then pick up the transaction from the queue and attempt another 10 million cycles. During this process, we check for any suspend signals. If the verification isn't completed within the 10 million cycles, the transaction is pushed back to the Chunk Queue for further processing.
Issues of the old design are:
The performance is poor for large-cycle transactions.
The code is hard to maintain.
Thanks to the CKB team for their efforts in designing a new strategy for pausing verification, it’s signal based. When a pause signal is received, the verification worker sends a pause flag to CKB-VM, which then halts its CPU usage. The verification worker then enters a waiting state, awaiting the resume signal. Once the resume signal is received, the pause flag is cleared, and CKB-VM resumes processing. You can checkout more details of this implementation.
With this new design, CKB-VM remains in memory, eliminating the need to snapshot or store the cycles, simplifying the process and improving performance.
This new design enables us to implement an improved script verification process. We still use a Verify Queue to store pending transactions, but now we have a verify manager responsible for managing multiple workers. Each worker picks up a transaction from the queue, and CKB-VM is started to process the verification.
When suspend or resume signals are triggered, the verify manager controls them by sending the suspend signal to the appropriate verification worker.
In practice, we've identified scenarios where too many large-cycle transactions can block the processing of normal transactions. For example, if there are many large-cycle transactions, all verify workers might be busy processing them, causing normal transactions to wait longer—even though they could be handled in a much shorter time. To address this, we’ve allocated one worker exclusively for processing normal transactions, while the remaining workers pick transactions based on their timestamps.
We’ve conducted benchmark testing, which demonstrates a performance improvement of at least 5x by using multiple workers.
These are the three upgrades on CKB transaction pool and Script verification flow. If you want to explore more details, here are the Pull Requests: