LLVM-K Backend
In this guide, we explain how the LLVM-K backend works and how it powers fast and correct-by-construction semantics-based program execution.
Language Definitions in K
A formal definition of a programming language in K consists of three main components:
Concrete syntax: A conventional BNF grammar that defines the structure of well-formed programs.
Configurations: A data structure that holds all the code and the semantic information during the runtime of a program.
Operational semantics: A set of rewrite rules that define the transition over configurations.
K uses rewrite rules to specify how a program executes. A rewrite rule specifies how a configuration is matched and then transforms into another one. Program execution is nothing but a sequence of continuous and consecutive transitions over configurations. A tool that takes any configuration and rewrites it into a successor configuration following the set of rewrite rules for the language, is an interpreter of the language.
LLVM-K can automatically generates an interpreter for a programming language based on its formal semantics and rewrite rules.
LLVM-K Overview
LLVM-K is a meta-level tool that automatically generates an interpreter of a programming language based on its formal semantics and rewrite rules. The generated interpreter is represented in LLVM IR and further compiled into efficient native code, bringing us promising performance.
LLVM-K is made up of several components that work together to optimize execution:
KORE AST data structures: Represent KORE syntax in memory.
Decision tree generator: Converts KORE rules into structured decision trees for efficient pattern matching.
Code generator: Translates decision trees into compiled code that applies rewrite rules.
Runtime terms: Data structures that represent the program’s state during execution.
Runtime library: Provides performance-focused utilities implemented in C/C++ and LLVM IR.
Memory management and garbage collector: Handles term allocation and deallocation during execution.
LLVM-K Internals
LLVM-K operates not on raw K definitions, but on an intermediate representation thereof—KORE, which it compiles into native interpreter binaries that execute rewrite rules efficiently.
A rewrite rule consists of a pattern (left-hand side) and a transformation (right-hand side). When a program runs, LLVM-K needs to determine which rule should be applied next. Instead of searching through thousands of possibilities, it builds decision trees to make this process more efficient.
LLVM-K matches the rule’s pattern against the current state of execution and then applies its corresponding transformation to update the program state. This repeats until no more rules apply, meaning execution has reached a final state.
K provides a way to formally define languages, ensuring correctness. But correctness alone isn’t enough—execution must be fast. LLVM-K bridges the gap, making K-based execution not just correct but also performant. This is crucial for our vision of verifiable, efficient computation.
Semantics-Based Compilation
LLVM-K is essentially an interpreter generator. It takes a programming language PL
and generates an interpreter, denoted K[PL]
. This interpreter works as a function that takes a program P
and input in
, and returns output out
. Formally, we can write this as K[PL](P, in) = out
.
Semantics-based compilation (SBC) is the process of partially and symbolically executing P
as much as it can, resulting in a set of new rewrite rules that summarize the effects of each and every basic blocks of P
. These new rewrite rules generated from the SBD process form a new SBC semantics, denoted P/PL
, which operates on the same configurations as the original semantics PL
with the property that
However, the new SBC semantics P/PL
is much more efficient than the original semantics PL
because one step of P/PL
corresponds to an entire basic block of P
in language PL
.
For example, if P
is a simple list of sequential statements that has no loops or external function calls, then its SBC semantics P/PL
will include one and only one rewrite rule that goes from the beginning of P
all the way to the end of P
, summarizing the entire program into a step.
Therefore, SBC is a semantics-level optimization. The SBC semantics of a program is a semantics specialized for the said program.
SBC, or semantics-based compilation, gets its name because a compiler is, mathematically speaking, a partial evaluation of an interpreter. SBC enables K to benefit from a speedup that's similar to that of a compiler compared to an interpreter.
Last updated
Was this helpful?