Towards the close of 2022, a notable development emerged in the realm of cryptography as a team challenged the security of RSA using quantum computers. (For details, refer to their paper.) Even though the team managed to crack RSA by factoring integers up to 48 bits, it implies that common signature algorithms may no longer be as secure as we envision in the future.
With the rising convenience and privacy of digital assets, more individuals are converting their assets into cryptocurrencies, anticipating broader adoption of cryptos in the future. At the same time, advances in computer semiconductor technology are rapidly boosting CPU performance, while quantum computers continue to pose greater challenges to future encryption algorithms’ security.
Safeguarding users‘ assets and ensuring the enduring accessibility of their digital holdings have always been the top concern for Nervos CKB. As developers of the CKB VM team, we've developed the quantum-resistant lock script to address quantum computation challenges, designed to withstand potential threats. The following sections will provide insights into the project's development.
Why SPHINCS+ Quantum Resistant Lock
Post-quantum cryptography (PQC) (sometimes referred to as quantum-proof, quantum-safe, or quantum-resistant) is the development of cryptographic algorithms (usually public-key algorithms) that are thought to be secure against a cryptanalytic attack by a quantum computer.
To date, numerous PQC algorithms have emerged. However, we have ultimately opted for SPHINCS+ due to several considerations:
Security: SPHINCS+ is a signature scheme based on hash functions and hash chains, designed to withstand attacks from quantum computing. Its utilization of the structure of hash functions makes it challenging for attackers to expedite the cracking of signatures through quantum algorithms.
Simplicity and Efficiency: In comparison to some complex PQC solutions, SPHINCS+ boasts a relatively straightforward algorithm, ensuring efficient implementation. This simplicity is particularly crucial for on-chain scripts running on the CKB-VM. (This is just speculation based on some information, Its feasibility will be confirmed through subsequent validation processes)
Compact Key/Signature Size: On-chain scripts impose restrictions on the size of signatures, and smaller key/signature sizes are advantageous for transmission and storage.
SPHINCS+ is a stateless hash-based signature scheme, which was submitted to the NIST post-quantum crypto project. The design advances the SPHINCS signature scheme, the latter was presented at EUROCRYPT 2015. It incorporates multiple improvements, specifically aimed at reducing signature size. For a quick overview of the changes from SPHINCS to SPHINCS+ see the blog post by Andreas Hülsing. The submission proposes three different signature schemes:
SPHINCS+——SHAKE256
SPHINCS+——SHA-256
SPHINCS+——Haraka
These signature schemes are obtained by instantiating the SPHINCS+ construction with SHAKE256, SHA-256, and Haraka, respectively.
SPHINCS+’s official documentation and code can be directly utilized for on-chain script development (In this article, “SPHINCS+” refers to the algorithm, while “sphincsplus” refers to the code implementation of SPHINCS+). The sphincsplus source code includes the reference implementation and several optimized assembly versions targeting the x86. Since CKB-VM is based on the RISC-V instruction set, we choose the reference implementation for on-chain script implementation.
On-Chain Script Design and Optimization
CKB Quantum Resistant Lock Script project is structurally divided into two main components: the CKB on-chain script related segment, including the code and tests to interact with CKB, and the SPHINCS+-related segment, covering the verification code, signature code for testing, as well as the testing and optimization for this segment.
Integrate SPHINCS+ in CKB On-Chain Script: Efficient Verification and Testing Considerations
The sphincsplus original code has been retained intact and directly integrated into our project by referencing git submodule. An intermediary layer, ckb-sphincsplus.c, was introduced between SPHINCS+ and the on-chain script, encompassing the entire signature process of key generation, signing, and verification. Only the verification process is involved in the on-chain script; others are hidden by the compiler macro CKB_VM
.
As mentioned earlier, SPHINCS+ supports multiple signature types, and in sphincsplus, macros control different code types during the compilation. The above-mentioned approach is also employed to manage on-chain script compilation and testing. The code involved in this stage doesn't call on-chain script related functions, and can run directly in CKB-VM, facilitating subsequent testing and optimization.
Let me briefly digress here. In the early stage of the development, we considered creating an all-in-one on-chain script, and we implemented it. However, later we encountered two challenges: 1. This approach necessitated modifications to the sphincsplus source code, demanding specific testing for the modified parts, and updating sphincsplus became rather intricate; 2. The on-chain script thus became overly complex. For these reasons, this approach was eventually disposed of.
C language is used for testing the CKB on-chain script related segment. The code includes both basic and fuzzing testing. Basic testing uses scripts to compile and test the entire signatures of various types, aiming to cover most use cases. Fuzz testing is even more extensive. These two types of tests primarily focus on the coverage of the intermediary layer, i.e., ckb-sphincsplus.c, instead of the coverage of sphincsplus source code. The latter contains complete test cases and the code remains unaltered. Therefore, the priority is to ensure the coverage of ckb-sphincsplus.c, meanwhile checking for any anomalies within sphincsplus.
Design of CKB Quantum Resistant Lock Script
The design very much resembles the common CKB Lock (refer to the default lock SECP256K1/blake160), but without supporting multi-signature. Given the already lengthy verification time for single-signature, the time required for multi-signature is almost impractical.
The main components are:
Args: consists of the blake2b hash of the public key, with a fixed length of 32-byte.
Witnesses: comprises public key plus signature data, with varying lengths for different on-chain script types, which will be detailed later.
Message: a data set hashed from several components such as transaction hash and witnesses, with a fixed length of 32-byte. The method to get a message is identical to that of the common CKB Lock.
Regarding the execution of CKB Quantum Resistant Lock Script, a notable difference from the execution of SECP256K1 is that, due to the former’s relatively long verification time, public key and args (public key hash) are firstly verified.
The code for testing full on-chain script functionality is here, where a complete transaction is created and executed using Rust's libraries such as ckb-script. Since various types of SPHINCS+ have already been tested, here we only test the on-chain script of the default type. The script is also provided, facilitating testing other types by adding specific parameters. Since the operation is rather time-consuming, it is excluded from the CI.
The sizes of SPHINCS+ public keys and signatures are as follows:
Explore Optimization Strategies
As the execution efficiency of quantum-resistant algorithms is mostly low, our optimization goal is to use higher signature types while keeping the verify cycles within 70M.
Due to the attributes of SPHINCS+, the verify cycles may vary slightly with different keys and sign data. To ensure the optimization is effective, we created signature information for different types in tests/sphincsplus/test_data. This data results from modifying the test code, but was not submitted because it was only used once.
optimization-sphincsplus.c is an on-chain script only for verification, while the signature data are provided by test_data. After compilation, optimization-sphincsplus.c is executed by ckb-debugger, resulting in the generation of cycles. Since the key, message, and sign are fixed, and Docker image is used for compilation, the cycles remain constant with each execution.
A detail to note is that this value only represents the cycles involved in the verification process. The actual on-chain script execution includes additional data processing, leading to higher cycle counts. The actual on-chain script execution value can be found in tests/sphincsplus_rust/run_example.sh.
The table below illustrates the value of cycles before optimization, executed by ckb-debugger. For better readability, we use MB, 1MB = 1024 * 1024.
128s | 128f | 192s | 192f | 256s | 256f | |
SHAKE-simple | 28.5 | 83.6 | 42.7 | 124.1 | 62.7 | 122.6 |
SHA2-simple | 13.2 | 42.2 | 20.7 | 60.3 | 30.5 | 58.8 |
Haraka-simple | 26.8 | 71.9 | 38.1 | 102.9 | 59.0 | 112.1 |
SHAKE-robust | 58.4 | 167.7 | 83.3 | 250.4 | 124.4 | 255.5 |
SHA2-robust | 28.2 | 80.8 | 42.3 | 122.5 | 77.0 | 166.4 |
Haraka-robust | 46.0 | 121.4 | 71.0 | 184.9 | 103.4 | 195.4 |
Note: For instance, SPHINCS+-SHAKE-simple-128s signifies a SPHINCS+ scheme that employs the SHAKE hash function with a security strength of 128-bit, configured with a simplified version ("s"). The “f” in “128f” denotes the use of a “full” version. The “s” and “f” are related to the height of the hypertree. Details on SPINCS+ hypertree can be found under Section 4.1 of this paper. “Simple” refers to the simplified instantiation, while “robust" emphasizes enhanced security features. Learn more about SPHINCS+ parameter sets under Section 7 in the same paper.
The optimization process begins by identifying hotspots, then proceeds with targeted optimizations.
You can use make ppref
to get a flame graph like below. Since it is only made for users, the dependencies of FlameGraph are not introduced into our project. If needed, you can manually download and set the path.)
The above chart shows the result of SPHINCS+-SHAKE-128f-simple. We can observe that the majority of the time is consumed by the KeccakF1600_StatePermute
, which is the target of the subsequent optimization. The function heavily employs B instructions, as the multiple ROL macro suggests, as well as Ega = BCa ^ ((~BCe) & BCi);
, which can be done using andn
in the B instructions. They fall under RISC-V’s B extension instructions. While the current CKB-VM supports B instructions, the GNU toolchain at that time did not. Manual optimization using .bytes
is the only viable solution, as the following example suggests:
// We do NOT recommend this approach.
// rol t0,t1,t2
__asm__ (".byte 0xb3,0x12,0x73,0x60");
This approach is highly inefficient. Using .byte
inevitably results in the register address being hard-coded in the entire instruction, meaning that each time using rol
requires loading data into the t0 register beforehand, then storing the result back into memory after completion. In practice, this process can be optimized for data to stay in registers for subsequent instructions.
Given this, we started to experiment by rewriting the functions in assembly. Rewriting the entire hotspot functions in assembly might produce some results, but not much. Our final result was marginally faster by less than 10%. Anyways, we do not recommend trying this.
Despite these attempts, CKB-VM's B instruction set is stable, and the team has recently completed upgrading the GNU Toolchain (docker image), allowing for the optimization of B instruction during compilation. Therefore, a new attempt using the updated toolchain was made, with the results below, indicating a significant performance improvement:
128s | 128f | 192s | 192f | 256s | 256f | |
SHAKE-simple | 16.9 | 49.5 | 25.4 | 73.7 | 37.0 | 72.3 |
SHA2-simple | 10.7 | 33.9 | 16.7 | 48.7 | 24.7 | 47.4 |
Haraka-simple | 26.4 | 70.8 | 37.5 | 101.3 | 57.8 | 110.0 |
SHAKE-robust | 34.2 | 98.2 | 49.1 | 147.3 | 73.1 | 150.1 |
SHA2-robust | 22.5 | 64.5 | 34.0 | 98.5 | 60.3 | 130.2 |
Haraka-robust | 44.2 | 116.4 | 68.3 | 177.4 | 99.5 | 187.6 |
Improvement in percentage:
128s | 128f | 192s | 192f | 256s | 256f | |
SHAKE-simple | 40.76 | 40.80 | 40.59 | 40.64 | 41.01 | 41.05 |
SHA2-simple | 19.49 | 19.59 | 19.17 | 19.29 | 19.26 | 19.32 |
Haraka-simple | 1.64 | 1.55 | 1.66 | 1.56 | 1.92 | 1.94 |
SHAKE-robust | 41.38 | 41.42 | 41.13 | 41.18 | 41.22 | 41.26 |
SHA2-robust | 20.12 | 20.15 | 19.56 | 19.62 | 21.66 | 21.77 |
Haraka-robust | 3.84 | 4.13 | 3.81 | 4.05 | 3.80 | 4.03 |
We have the following conclusions:
There are no issues using the new GNU Toolchain to optimize CKB quantum-resistant lock script.
Improvement is around 40% for SHAKE and 20% for SHA2, indicating a successful optimization in their cases.
Improvement for Haraka is small, but no degradation at least.
On-Chain Script Deployment and Lock Script Conversion
Based on the optimized results, SHAKE-128f-simple stands out because it allows the on-chain script to be approximately 50M. It is therefore selected as the default hash function.
ckb-cli is used for on-chain script deployment, which is completed on the testnet. Transaction details for deployment can be found here.
Additionally, CKB Quantum Resistant Lock Script provides a lock script conversion tool, tools/ckb-sphincs-tools. You can use it to construct a transaction and convert it with the default CKB LockScript into one with the Quantum Resistant Lock. The conversion consumes a certain fee, but it is low.
Run the following Cargo command to convert:
cargo run -- cc_to_sphincsplus --tx_hash <tx-hash> --tx_index <index> --key_file key.json --prikey <You can use ckb-cli account export>
Shown below are the transaction details on testnet:
For more, visit CKB Explorer.
To unlock the above transaction, run the following command:
cargo run -- cc_to_secp --tx_hash <tx-hash> --tx_index <index> --key_file key.json --lock_arg <LOCK-ARG> --sp_tx_hash <SPHINCS+ Script in step 2> --sp_tx_index <index> --fee 10000
You can observe that the output cell in the first transaction is the input cell in the second.
For more, visit CKB Explorer.
Final Thoughts
It’s better not to make modifications when referencing third-party projects, as this enables timely synchronization with new versions, especially for cryptographic signature modules.
Testing should be as comprehensive as possible.
Avoid manual assembly coding unless it’s absolutely necessary, as the benefits may be limited and negative optimization could occur.
The benefits of upgrading the toolchain can be significant.
✍🏻 Witten by Zishuang Han
You may also be interested in:
By the same author: