Semantics-Based Compilation

Performance results for LLVM-K are promising, e.g., when compared to Geth, the official implementation of EVM in Go and most adopted Ethereum client. Nevertheless, LLVM-K benefits from a series of additional optimizations enabled by K's semantics-based approach, and not possible in compiler-based approaches.

What we refer to here is compositional symbolic execution, or CSE in short. Our results show that LLVM-K extended with CSE outperforms Geth by 1.58x.

What is CSE?

In brief, CSE is the process of getting a faster, optimized semantics for a given program in a given language. More specifically, CSE transforms a large and complex small-step semantics into a smaller and simpler one, allowing for "medium"-step rules which speed-up the execution.

You can think of CSE as the partial evaluation of an interpreter (which gives you a compiler!) but carried out via the formal semantics. In other words, enabling CSE in K is like going from interpreters to compilers, with an expectation of an order of magnitude speedup.

Types of CSE

There are two types of CSE which are:

CSE[Pgm]: This type of CSE occurs at the program level and assumes that the program is known ahead of time.
CSE[OpCode]: This type of CSE occurs at the language level and it optimizes the general semantics without knowing the programs ahead of time, but relies on side information about the context of the possible programs.

CSE[Pgm]

CSE[Pgm] considers the semantics of a single program in its transformation. It uses symbolic execution to produce a "medium"-step semantics operating on basic blocks in the program.

This means that multiple small execution steps are composed into a single step that always performs the same operations as the underlying small steps would, regardless of the input. The new semantics is, in essence, the program's control flow graph (CFG), where the edges are the medium-step rewrite rules.

CSE[OpCode]

CSE[OpCode] doesn't assume knowledge of the program to be executed ahead of time, but certain characteristics thereof, such as the execution order for certain semantic rules.

CSE[OpCode] merges rules in the original small-step semantics that commonly follow each other sequentially, without branching or looping. E.g., checking if there is enough gas available, performing an operation, and updating the program counter are a series of three consecutive actions appearing recurrently in EVM.

Finally, additional parameters known ahead of time, such as the gas schedule, can also be internalized for further performance gains.

Both CSE techniques are “correct-by-construction”, as the optimized semantics are built by symbolically executing the original one.

Further, not only do they enable faster program execution, but also shorter proof certificates for the execution of the program. K can produce formal mathematical proofs in Metamath showing that these optimizations are correct.

Last updated 1 month ago

Was this helpful?