About salsa
Salsa is a Rust framework for writing incremental, on-demand programs -- these are programs that want to adapt to changes in their inputs, continuously producing a new output that is up-to-date. Salsa is based on the incremental recompilation techniques that we built for rustc, and many (but not all) of its users are building compilers or other similar tooling.
If you'd like to learn more about Salsa, check out:
- The overview, for a brief summary.
- The tutorial, for a detailed look.
- You can also watch some of our videos, though the content there is rather out of date.
If you'd like to chat about Salsa, or you think you might like to contribute, please jump on to our Zulip instance at salsa.zulipchat.com.
Salsa overview
This page contains a brief overview of the pieces of a Salsa program. For a more detailed look, check out the tutorial, which walks through the creation of an entire project end-to-end.
Goal of Salsa
The goal of Salsa is to support efficient incremental recomputation. Salsa is used in rust-analyzer, for example, to help it recompile your program quickly as you type.
The basic idea of a Salsa program is like this:
#![allow(unused)] fn main() { let mut input = ...; loop { let output = your_program(&input); modify(&mut input); } }
You start out with an input that has some value. You invoke your program to get back a result. Some time later, you modify the input and invoke your program again. Our goal is to make this second call faster by re-using some of the results from the first call.
In reality, of course, you can have many inputs and "your program" may be many different methods and functions defined on those inputs. But this picture still conveys a few important concepts:
- Salsa separates out the "incremental computation" (the function
your_program) from some outer loop that is defining the inputs. - Salsa gives you the tools to define
your_program. - Salsa assumes that
your_programis a purely deterministic function of its inputs, or else this whole setup makes no sense. - The mutation of inputs always happens outside of
your_program, as part of this master loop.
Database
Each time you run your program, Salsa remembers the values of each computation in a database. When the inputs change, it consults this database to look for values that can be reused. The database is also used to implement interning (making a canonical version of a value that can be copied around and cheaply compared for equality) and other convenient Salsa features.
Inputs
Every Salsa program begins with an input. Inputs are special structs that define the starting point of your program. Everything else in your program is ultimately a deterministic function of these inputs.
For example, in a compiler, there might be an input defining the contents of a file on disk:
#![allow(unused)] fn main() { #[salsa::input] pub struct ProgramFile { pub path: PathBuf, pub contents: String, } }
You create an input by using the new method.
Because the values of input fields are stored in the database, you also give an &-reference to the database:
#![allow(unused)] fn main() { let file: ProgramFile = ProgramFile::new( &db, PathBuf::from("some_path.txt"), String::from("fn foo() { }"), ); }
Mutable access is not needed since creating a new input cannot affect existing tracked data in the database.
Salsa structs are just integers
The ProgramFile struct generated by the salsa::input macro doesn't actually store any data. It's just a newtyped integer id:
#![allow(unused)] fn main() { // Generated by the `#[salsa::input]` macro: #[derive(Copy, Clone, PartialEq, Eq, Hash)] pub struct ProgramFile(salsa::Id); }
This means that, when you have a ProgramFile, you can easily copy it around and put it wherever you like.
To actually read any of its fields, however, you will need to use the database and a getter method.
Reading fields and returns(mode)
You can access the value of an input's fields by using the getter method.
As this is only reading the field, it just needs a &-reference to the database:
#![allow(unused)] fn main() { let contents: &str = file.contents(&db); }
Field getters return a reference into the database by default. Use #[returns(copy)] for Copy fields or #[returns(clone)] to return an owned clone instead. #[returns(deref)] borrows through Deref, so a String field returns a &str. For optional and fallible values, #[returns(as_ref)] converts an Option<T> or Result<T, E> into references, while #[returns(as_deref)] also borrows through Deref (for example, converting Option<String> into Option<&str>).
#![allow(unused)] fn main() { #[salsa::input] pub struct ProgramFile { #[returns(clone)] pub path: PathBuf, #[returns(deref)] pub contents: String, } }
Now file.path(&db) returns a PathBuf, while file.contents(&db) returns a &str.
Writing input fields
Finally, you can also modify the value of an input field by using the setter method.
Since this is modifying the input, and potentially invalidating data derived from it,
the setter takes an &mut-reference to the database:
#![allow(unused)] fn main() { use salsa::Setter as _; file.set_contents(&mut db).to(String::from("fn foo() { /* add a comment */ }")); }
Note that the setter method set_contents returns a "builder".
This gives the ability to set the durability and other advanced concepts.
Tracked functions
Once you've defined your inputs, the next thing to define are tracked functions:
#![allow(unused)] fn main() { #[salsa::tracked] fn parse_file<'db>(db: &'db dyn crate::Db, file: ProgramFile) -> Ast<'db> { let contents: &str = file.contents(db); ... } }
When you call a tracked function, Salsa will track which inputs it accesses (in this example, file.contents(db)).
It will also memoize the return value (the Ast, in this case).
If you call a tracked function twice, Salsa checks if the inputs have changed; if not, it can return the memoized value.
The algorithm Salsa uses to decide when a tracked function needs to be re-executed is called the red-green algorithm, and it's where the name Salsa comes from.
Tracked functions have to follow a particular structure:
- They must take a
&-reference to the database as their first argument.- Note that because this is an
&-reference, it is not possible to modify inputs during a tracked function!
- Note that because this is an
- They may take no other arguments, one Salsa struct, or multiple arguments that implement
EqandHash. A single Salsa struct can be used directly as the query key, whereas multiple arguments are interned together to create a key.
Tracked functions return a reference to their memoized value by default, so callers of parse_file receive an &Ast<'_>. Use #[salsa::tracked(returns(clone))] to clone the value out of the database instead.
Tracked structs
Tracked structs are intermediate structs created during your computation. Like inputs, their fields are stored inside the database, and the struct itself just wraps an id. Unlike inputs, they can only be created inside a tracked function, and their fields can never change once they are created (until the next revision, at least). Getter methods are provided to read the fields, but there are no setter methods. Example:
#![allow(unused)] fn main() { #[salsa::tracked] struct Ast<'db> { #[tracked] #[returns(deref)] top_level_items: Vec<Item<'db>>, } }
Just as with an input, new values are created by invoking Ast::new. The new function on a tracked struct only requires a &-reference to the database:
#![allow(unused)] fn main() { #[salsa::tracked] fn parse_file<'db>(db: &'db dyn crate::Db, file: ProgramFile) -> Ast<'db> { let contents: &str = file.contents(db); let mut parser = Parser::new(contents); let mut top_level_items = vec![]; while let Some(item) = parser.parse_top_level_item() { top_level_items.push(item); } Ast::new(db, top_level_items) // <-- create an Ast! } }
Identity fields and #[tracked] fields
By default, a field is part of the tracked struct's identity.
Fields annotated with #[tracked] are not part of its identity; their values can be updated when Salsa matches the struct across executions.
If a tracked field's value has not changed, then other tracked functions that only read that field will not be re-executed.
For example, imagine that we had a tracked struct for items in the file:
#![allow(unused)] fn main() { #[salsa::tracked] struct Item<'db> { name: Word<'db>, // we'll define Word in a second! #[tracked] body: Ast<'db>, } }
Here name is part of the item's identity, while body can change without changing the item's identity.
Specify the result of tracked functions for particular structs
Sometimes it is useful to define a tracked function but specify its value for some particular struct specially. For example, maybe the default way to compute the representation for a function is to read the AST, but you also have some built-in functions in your language and you want to hard-code their results. This can also be used to simulate a field that is initialized after the tracked struct is created.
To support this use case, you can use the specify method associated with tracked functions.
To enable this method, you need to add the specify flag to the function to alert users that its value may sometimes be specified externally.
#![allow(unused)] fn main() { #[salsa::tracked(specify)] // <-- specify flag required fn representation<'db>(db: &'db dyn crate::Db, item: Item<'db>) -> Representation { // read the user's input AST by default let ast = ast(db, item); // ... } #[salsa::tracked] fn create_builtin_item<'db>(db: &'db dyn crate::Db) -> Item<'db> { let i = Item::new(db, ...); let r = hardcoded_representation(); representation::specify(db, i, r); // <-- use the method! i } }
Specifying is only possible for tracked functions that take a single tracked struct as an argument (besides the database).
The call to specify must occur in the same tracked-function invocation that created that struct.
Interned structs
The final kind of Salsa struct are interned structs. Interned structs are useful for quick equality comparison. They are commonly used to represent strings or other primitive values.
Most compilers, for example, will define a type to represent a user identifier:
#![allow(unused)] fn main() { #[salsa::interned] struct Word<'db> { #[returns(deref)] pub text: String, } }
As with input and tracked structs, the Word struct itself is just a newtyped integer, and the actual data is stored in the database.
You can create a new interned struct using new, just like with input and tracked structs:
#![allow(unused)] fn main() { let w1 = Word::new(db, "foo".to_string()); let w2 = Word::new(db, "bar".to_string()); let w3 = Word::new(db, "foo".to_string()); }
When you create two interned structs with the same field values, you are guaranteed to get back the same integer id. So here, we know that assert_eq!(w1, w3) is true and assert_ne!(w1, w2).
You can access the fields of an interned struct using a getter, like word.text(db), which returns a &str because of the #[returns(deref)] annotation. The fields of interned structs are immutable.
Accumulators
The final Salsa concept are accumulators. Accumulators are a way to report errors or other "side channel" information that is separate from the main return value of your function.
To create an accumulator, you declare a type as an accumulator:
#![allow(unused)] fn main() { #[salsa::accumulator] pub struct Diagnostics(String); }
Now, during a tracked function's execution, you can accumulate values of this type:
#![allow(unused)] fn main() { use salsa::Accumulator as _; Diagnostics("some_string".to_string()).accumulate(db); }
Then later, from outside the execution, you can ask for the set of diagnostics that were accumulated by some particular tracked function. For example, imagine that we have a type-checker and, during type-checking, it reports some diagnostics:
#![allow(unused)] fn main() { #[salsa::tracked] fn type_check<'db>(db: &'db dyn Db, item: Item<'db>) { // ... Diagnostics("some error message".to_string()).accumulate(db); // ... } }
we can then later invoke the associated accumulated function to get references to all the Diagnostics values that were accumulated:
#![allow(unused)] fn main() { let diagnostics: Vec<&Diagnostics> = type_check::accumulated::<Diagnostics>(db, item); }
Tutorial: calc
This tutorial walks through an end-to-end example of using Salsa. It does not assume you know anything about salsa, but reading the overview first is probably a good idea to get familiar with the basic concepts.
Our goal is define a compiler/interpreter for a simple language called calc.
The calc compiler takes programs like the following and then parses and executes them:
fn area_rectangle(w, h) = w * h
fn area_circle(r) = 3.14 * r * r
print area_rectangle(3, 4)
print area_circle(1)
print 11 * 2
When executed, this program prints 12, 3.14, and 22.
If the program contains errors (e.g., a reference to an undefined function), it prints those out too. And, of course, it will be reactive, so small changes to the input don't require recompiling (or reexecuting, necessarily) the entire thing.
Basic structure
Before we do anything with Salsa, let's talk about the basic structure of the calc compiler. Part of Salsa's design is that you are able to write programs that feel 'pretty close' to what a natural Rust program looks like.
Example program
This is our example calc program:
x = 5
y = 10
z = x + y * 3
print z
Parser
The calc compiler takes as input a program, represented by a string:
#![allow(unused)] fn main() { struct SourceProgram { text: String } }
The first thing it does it to parse that string into a series of statements that look something like the following pseudo-Rust:1
#![allow(unused)] fn main() { enum Statement { /// Defines `fn <name>(<args>) = <body>` Function(Function), /// Defines `print <expr>` Print(Expression), } /// Defines `fn <name>(<args>) = <body>` struct Function { name: FunctionId, args: Vec<VariableId>, body: Expression } }
where an expression is something like this (pseudo-Rust, because the Expression enum is recursive):
#![allow(unused)] fn main() { enum Expression { Op(Expression, Op, Expression), Number(f64), Variable(VariableId), Call(FunctionId, Vec<Expression>), } enum Op { Add, Subtract, Multiply, Divide, } }
Finally, for function/variable names, the FunctionId and VariableId types will be interned strings:
#![allow(unused)] fn main() { type FunctionId = /* interned string */; type VariableId = /* interned string */; }
Because calc is so simple, we don't have to bother separating out the lexer from the parser.
Checker
The "checker" has the job of ensuring that the user only references variables that have been defined.
We're going to write the checker in a "context-less" style,
which is a bit less intuitive but allows for more incremental re-use.
The idea is to compute, for a given expression, which variables it references.
Then there is a function check which ensures that those variables are a subset of those that are already defined.
Interpreter
The interpreter will execute the program and print the result. We don't bother with much incremental re-use here, though it's certainly possible.
Defining the database struct
First, we need to create the database struct. Typically it is only used by the "driver" of your application; the one which starts up the program, supplies the inputs, and relays the outputs.
In calc, the database struct is in the db module, and it looks like this:
#![allow(unused)] fn main() { #[salsa::db] #[derive(Clone)] #[cfg_attr(not(test), derive(Default))] pub struct CalcDatabaseImpl { storage: salsa::Storage<Self>, // The logs are only used for testing and demonstrating reuse: #[cfg(test)] logs: Arc<Mutex<Option<Vec<String>>>>, } #[cfg(test)] impl Default for CalcDatabaseImpl { fn default() -> Self { let logs = <Arc<Mutex<Option<Vec<String>>>>>::default(); Self { storage: salsa::Storage::new(Some(Box::new({ let logs = logs.clone(); move |event| { eprintln!("Event: {event:?}"); // Log interesting events, if logging is enabled if let Some(logs) = &mut *logs.lock().unwrap() { // only log interesting events if let salsa::EventKind::WillExecute { .. } = event.kind { logs.push(format!("Event: {event:?}")); } } } }))), logs, } } } }
The #[salsa::db] attribute marks the struct as a database.
It must have a field named storage whose type is salsa::Storage<Self>, but it can also contain whatever other fields you want.
Implementing the salsa::Database trait
In addition to the struct itself, we must add an impl of salsa::Database:
#![allow(unused)] fn main() { #[salsa::db] impl salsa::Database for CalcDatabaseImpl {} }
Defining the IR
Before we can define the parser, we need to define the intermediate representation (IR) that we will use for calc programs.
In the basic structure, we defined some "pseudo-Rust" structures like Statement and Expression;
now we are going to define them for real.
"Salsa structs"
In addition to regular Rust types, we will make use of various Salsa structs. A Salsa struct is a struct that has been annotated with one of the Salsa annotations:
#[salsa::input], which designates the "base inputs" to your computation;#[salsa::tracked], which designate intermediate values created during your computation;#[salsa::interned], which designate small values that are easy to compare for equality.
All Salsa structs store the actual values of their fields in the Salsa database. This permits us to track when the values of those fields change to figure out what work will need to be re-executed.
When you annotate a struct with one of the above Salsa attributes, Salsa actually generates a bunch of code to link that struct into the database.
Input structs
The first thing we will define is our input. Every Salsa program has some basic inputs that drive the rest of the computation. The rest of the program must be some deterministic function of those base inputs, such that when those inputs change, we can try to efficiently recompute the new result of that function.
Inputs are defined as Rust structs with a #[salsa::input] annotation:
#![allow(unused)] fn main() { #[salsa::input(debug)] pub struct SourceProgram { #[returns(deref)] pub text: String, } }
In our compiler, we have just one simple input, the SourceProgram, which has a text field (the string).
The data lives in the database
Although they are declared like other Rust structs, Salsa structs are implemented quite differently.
The values of their fields are stored in the Salsa database and the structs themselves just reference it.
This means that the struct instances are Copy (no matter what fields they contain).
Creating instances of the struct and accessing fields is done by invoking methods like new as well as getters and setters.
In the case of #[salsa::input], the struct contains a salsa::Id, which is a non-zero integer.
Therefore, the generated SourceProgram struct looks something like this:
#![allow(unused)] fn main() { #[derive(Copy, Clone, PartialEq, Eq, Hash)] pub struct SourceProgram(salsa::Id); }
It will also generate a method new that lets you create a SourceProgram in the database.
For an input, a &db reference is required, along with the values for each field:
#![allow(unused)] fn main() { use salsa::Setter as _; let source = SourceProgram::new(&db, "print 11 + 11".to_string()); }
You can read the value of the field with source.text(&db),
and you can set the value of the field with source.set_text(&mut db).to("print 11 * 2".to_string()).
Database revisions
Whenever a function takes an &mut reference to the database,
that means that it can only be invoked from outside the incrementalized part of your program,
as explained in the overview.
When you change the value of an input field, that increments a 'revision counter' in the database,
indicating that some inputs are different now.
When we talk about a "revision" of the database, we are referring to the state of the database in between changes to the input values.
Representing the parsed program
Next we will define a tracked struct. Whereas inputs represent the start of a computation, tracked structs represent intermediate values created during your computation.
In this case, the parser is going to take in the SourceProgram struct that we saw and return a Program that represents the fully parsed program:
#![allow(unused)] fn main() { #[salsa::tracked(debug)] pub struct Program<'db> { #[tracked] #[returns(deref)] pub statements: Vec<Statement<'db>>, } }
Like with an input, the fields of a tracked struct are also stored in the database.
Unlike an input, those fields have no setters.
Fields marked #[tracked] can be updated when Salsa recreates the struct in a later revision, and Salsa tracks reads of each of those fields independently.
In this case, if parsing the input produced the same Program result
(e.g., because the only change to the input was some trailing whitespace, perhaps),
then subsequent parts of the computation won't need to re-execute.
(We'll revisit the role of tracked structs in reuse more in future parts of the IR.)
Apart from having no setters, the API for working with a tracked struct is quite similar to an input:
- You can create a new value by using
new: e.g.,Program::new(&db, some_statements) - You use a getter to read the value of a field, just like with an input (e.g.,
my_func.statements(db)to read thestatementsfield).- In this case, the field is tagged as
#[returns(deref)], which means that the getter will return a&[Statement], instead of a&Vec<Statement>.
- In this case, the field is tagged as
The 'db lifetime
Unlike inputs, tracked structs carry a 'db lifetime.
This lifetime is tied to the &db used to create them and
ensures that, so long as you are using the struct,
the database remains immutable:
in other words, you cannot change the values of an input struct.
Representing functions
We will also use a tracked struct to represent each function:
The Function struct is going to be created by the parser to represent each of the functions defined by the user:
#![allow(unused)] fn main() { #[salsa::tracked(debug)] pub struct Function<'db> { #[returns(copy)] pub name: FunctionId<'db>, #[returns(copy)] name_span: Span<'db>, #[tracked] #[returns(deref)] pub args: Vec<VariableId<'db>>, #[tracked] pub body: Expression<'db>, } }
If we had created some Function instance f, for example, we might find that the f.body field changes
because the user changed the definition of f.
This would mean that we have to re-execute those parts of the code that depended on f.body
(but not those parts of the code that depended on the body of other functions).
Apart from having no setters, the API for working with a tracked struct is quite similar to an input:
- You can create a new value by using
new: e.g.,Function::new(&db, some_name, some_name_span, some_args, some_body) - You use a getter to read the value of a field, just like with an input (e.g.,
my_func.args(db)to read theargsfield).- The
#[returns(deref)]annotation makes this getter return a&[VariableId].
- The
Identity fields and tracked fields
To get better reuse across revisions, particularly when things are reordered, Salsa treats all fields not annotated with #[tracked] as identity fields.
Normally, identity fields represent the "name" of an entity.
This indicates that, across two revisions R1 and R2, if two functions are created with the same values for their identity fields, they refer to the same entity.
Salsa then compares each #[tracked] field with its value from the previous revision.
Changing one field invalidates queries that read that field, but not queries that only read other fields.
The identity fields determine which structs Salsa can match across revisions, so they affect reuse but not the result.
For more details, see the algorithm page of the reference.
Interned structs
The final kind of Salsa struct are interned structs. As with input and tracked structs, the data for an interned struct is stored in the database. Unlike those structs, if you intern the same data twice, you get back the same integer.
A classic use of interning is for small strings like function names and variables.
It's annoying and inefficient to pass around those names with String values which must be cloned;
it's also inefficient to have to compare them for equality via string comparison.
Therefore, we define two interned structs, FunctionId and VariableId, each with a single field that stores the string:
#![allow(unused)] fn main() { #[salsa::interned(debug)] pub struct VariableId<'db> { #[returns(deref)] pub text: String, } #[salsa::interned(debug)] pub struct FunctionId<'db> { #[returns(deref)] pub text: String, } }
When you invoke e.g. FunctionId::new(&db, "my_string".to_string()), you will get back a FunctionId that is just a newtype'd integer.
But if you invoke the same call to new again, you get back the same integer:
#![allow(unused)] fn main() { let f1 = FunctionId::new(&db, "my_string".to_string()); let f2 = FunctionId::new(&db, "my_string".to_string()); assert_eq!(f1, f2); }
Interned values carry a 'db lifetime
Like tracked structs, interned values carry a 'db lifetime that prevents them from being used across salsa revisions.
Interned values are guaranteed to be consistent within a single revision.
Across revisions, they may be cleared, reallocated, or reassigned -- but you cannot generally observe this,
since the 'db lifetime prevents you from changing inputs (and hence creating a new revision)
while an interned value is in active use.
Expressions and statements
We won't use any special "Salsa structs" for expressions and statements:
#![allow(unused)] fn main() { #[derive(Eq, PartialEq, Debug, Hash, salsa::SalsaValue)] pub struct Statement<'db> { pub span: Span<'db>, pub data: StatementData<'db>, } impl<'db> Statement<'db> { pub fn new(span: Span<'db>, data: StatementData<'db>) -> Self { Statement { span, data } } } #[derive(Eq, PartialEq, Debug, Hash, salsa::SalsaValue)] pub enum StatementData<'db> { /// Defines `fn <name>(<args>) = <body>` Function(Function<'db>), /// Defines `print <expr>` Print(Expression<'db>), } #[derive(Eq, PartialEq, Debug, Hash, salsa::SalsaValue)] pub struct Expression<'db> { pub span: Span<'db>, pub data: ExpressionData<'db>, } impl<'db> Expression<'db> { pub fn new(span: Span<'db>, data: ExpressionData<'db>) -> Self { Expression { span, data } } } #[derive(Eq, PartialEq, Debug, Hash, salsa::SalsaValue)] pub enum ExpressionData<'db> { Op(Box<Expression<'db>>, Op, Box<Expression<'db>>), Number(#[salsa_value(unsafe(prove_safe_to_retain_manually))] OrderedFloat<f64>), Variable(VariableId<'db>), Call(FunctionId<'db>, Vec<Expression<'db>>), } #[derive(Eq, PartialEq, Copy, Clone, Hash, Debug, salsa::SalsaValue)] pub enum Op { Add, Subtract, Multiply, Divide, } }
Since statements and expressions are not tracked, this implies that we are only attempting to get incremental re-use at the granularity of functions -- whenever anything in a function body changes, we consider the entire function body dirty and re-execute anything that depended on it. It usually makes sense to draw some kind of "reasonably coarse" boundary like this.
One downside of the way we have set things up: we inlined the position into each of the structs.
Defining the parser: memoized functions and inputs
The next step in the calc compiler is to define the parser.
The role of the parser will be to take the SourceProgram input,
read the string from the text field,
and create the Statement, Function, and Expression structures that we defined in the ir module.
To minimize dependencies, we are going to write a recursive descent parser. Another option would be to use a Rust parsing framework. We won't cover the parsing itself in this tutorial -- you can read the code if you want to see how it works. We're going to focus only on the Salsa-related aspects.
The parse_statements function
The starting point for the parser is the parse_statements function:
#![allow(unused)] fn main() { #[salsa::tracked(returns(copy))] pub fn parse_statements(db: &dyn crate::Db, source: SourceProgram) -> Program<'_> { // Get the source text from the database let source_text = source.text(db); // Create the parser let mut parser = Parser { db, source_text, position: 0, }; // Read in statements until we reach the end of the input let mut result = vec![]; loop { // Skip over any whitespace parser.skip_whitespace(); // If there are no more tokens, break if parser.peek().is_none() { break; } // Otherwise, there is more input, so parse a statement. if let Some(statement) = parser.parse_statement() { result.push(statement); } else { // If we failed, report an error at whatever position the parser // got stuck. We could recover here by skipping to the end of the line // or something like that. But we leave that as an exercise for the reader! parser.report_error(); break; } } Program::new(db, result) } }
This function is annotated as #[salsa::tracked].
That means that, when it is called, Salsa will track what inputs it reads as well as what value it returns.
The return value is memoized,
which means that if you call this function again without changing the inputs,
Salsa will reuse the result rather than re-execute it.
Tracked functions are the unit of reuse
Tracked functions are the core part of how Salsa enables incremental reuse.
The goal of the framework is to avoid re-executing tracked functions and instead to reuse their result.
Salsa uses the red-green algorithm to decide when to re-execute a function.
The short version is that a tracked function is re-executed if either (a) it directly reads an input, and that input has changed,
or (b) it directly invokes another tracked function and that function's return value has changed.
In the case of parse_statements, it directly reads SourceProgram::text, so if the text changes, then the parser will re-execute.
By choosing which functions to mark as #[tracked], you control how much reuse you get.
In our case, we're opting to mark the outermost parsing function as tracked, but not the inner ones.
This means that if the input changes, we will always re-parse the entire input and re-create the resulting statements and so forth.
We'll see later that this doesn't mean we will always re-run the type checker and other parts of the compiler.
This trade-off makes sense because (a) parsing is very cheap, so the overhead of tracking and enabling finer-grained reuse doesn't pay off and because (b) since strings are just a big blob-o-bytes without any structure, it's rather hard to identify which parts of the IR need to be reparsed. Some systems do choose to do more granular reparsing, often by doing a "first pass" over the string to give it a bit of structure, e.g. to identify the functions, but deferring the parsing of the body of each function until later. Setting up a scheme like this is relatively easy in Salsa and uses the same principles that we will use later to avoid re-executing the type checker.
Parameters to a tracked function
The first parameter to a tracked function is always the database, db: &dyn crate::Db.
Tracked functions may have no other parameters, one Salsa struct parameter, or multiple parameters that implement Eq and Hash.
A single Salsa struct can be used directly as the query key.
When a function has multiple parameters, Salsa interns their tuple to obtain a key, adding an interning step to each call.
Our parse_statements function takes one Salsa struct, the SourceProgram input.
The returns(copy) annotation
You may have noticed that parse_statements is tagged with #[salsa::tracked(returns(copy))].
Tracked functions ordinarily return a reference to their memoized value. The returns(copy)
attribute copies the value out of the database instead. This is a good fit for Program, which is
a small Copy handle to a tracked struct.
For return types that implement Deref, returns(deref) returns a reference to the dereferenced
target. For example, it can return a slice instead of a reference to a vector:
#![allow(unused)] fn main() { #[salsa::tracked(returns(deref))] fn source_lines(db: &dyn crate::Db, source: SourceProgram) -> Vec<String> { source.text(db).lines().map(str::to_owned).collect() } }
Calling source_lines returns an &[String] rather than the default &Vec<String>.
That reference is tied to the database borrow and cannot be held across a new revision.
Use returns(clone) when returning an owned clone is more convenient and the clone is known to be
inexpensive.
(You may recall the returns(deref) annotation from the IR section of the tutorial,
where it was placed on struct fields. Return-mode annotations work the same way for fields and
tracked functions.)
Defining the parser: reporting errors
The last interesting case in the parser is how to handle a parse error.
Because Salsa functions are memoized and may not execute, they should not have side-effects,
so we don't just want to call eprintln!.
If we did so, the error would only be reported the first time the function was called, but not
on subsequent calls in the situation where the simply returns its memoized value.
Salsa defines a mechanism for managing this called an accumulator.
In our case, we define an accumulator struct called Diagnostic in the ir module:
#![allow(unused)] fn main() { #[salsa::accumulator] #[derive(Debug)] #[allow(dead_code)] // Debug impl uses them pub struct Diagnostic { pub start: usize, pub end: usize, pub message: String, } }
Memoized functions can accumulate Diagnostic values.
Later, you can invoke a method to find all the values that were accumulated by the tracked functions
or any functions that they called
(e.g., we could get the set of Diagnostic values produced by the parse_statements function).
The Parser::report_error method contains an example of accumulating a diagnostic:
#![allow(unused)] fn main() { /// Report an error diagnostic at the current position. fn report_error(&self) { let next_position = match self.peek() { Some(ch) => self.position + ch.len_utf8(), None => self.position, }; Diagnostic { start: self.position, end: next_position, message: "unexpected character".to_string(), } .accumulate(self.db); } }
To get the diagnostics produced by parse_statements, or any other tracked function,
we invoke the associated accumulated function:
#![allow(unused)] fn main() { let accumulated: Vec<&Diagnostic> = parse_statements::accumulated::<Diagnostic>(db, source_program); // ----------- // Use turbofish to specify // the diagnostics type. }
accumulated takes the database followed by the tracked function's query arguments and returns a Vec of references.
Defining the parser: debug impls and testing
As the final part of the parser, we need to write some tests. To do so, we will create a database, set the input source text, run the parser, and check the result. Before we can do that, though, we have to address one question: how do we inspect Salsa structs whose fields live in the database?
Generated Debug implementations
The debug option on Salsa struct attributes generates an ordinary Debug implementation.
Tracked functions attach the database automatically; other code can use salsa::Database::attach to include field values:
#![allow(unused)] fn main() { use salsa::Database as _; db.attach(|db| { let function = FunctionId::new(db, "area_circle".to_string()); eprintln!("Function = {function:?}"); }); }
Without an attached database, the generated formatter displays only the Salsa ID.
Debug for ordinary Rust types
Types such as Op that are not Salsa structs can use the ordinary Debug derive:
#![allow(unused)] fn main() { #[derive(Debug)] pub enum Op { Add, Subtract, Multiply, Divide, } }
Writing the unit test
Now that we have our Debug implementations in place, we can write a simple unit test harness.
The parse_string function below creates a database, sets the source text, and then invokes the parser:
#![allow(unused)] fn main() { /// Create a new database with the given source text and parse the result. /// Returns the statements and the diagnostics generated. #[cfg(test)] fn parse_string(source_text: &str) -> String { use salsa::Database; use crate::db::CalcDatabaseImpl; CalcDatabaseImpl::default().attach(|db| { // Create the source program let source_program = SourceProgram::new(db, source_text.to_string()); // Invoke the parser let statements = parse_statements(db, source_program); // Read out any diagnostics let accumulated = parse_statements::accumulated::<Diagnostic>(db, source_program); // Format the result as a string and return it format!("{:#?}", (statements, accumulated)) }) } }
Combined with the expect-test crate, we can then write unit tests like this one:
#![allow(unused)] fn main() { #[test] fn parse_print() { let actual = parse_string("print 1 + 2"); let expected = expect_test::expect![[r#" ( Program { [salsa id]: Id(800), statements: [ Statement { span: Span { [salsa id]: Id(404), start: 0, end: 11, }, data: Print( Expression { span: Span { [salsa id]: Id(403), start: 6, end: 11, }, data: Op( Expression { span: Span { [salsa id]: Id(400), start: 6, end: 7, }, data: Number( 1.0, ), }, Add, Expression { span: Span { [salsa id]: Id(402), start: 10, end: 11, }, data: Number( 2.0, ), }, ), }, ), }, ], }, [], )"#]]; expected.assert_eq(&actual); } }
Defining the checker
Defining the interpreter
Reference
Durability
"Durability" is an optimization that can greatly improve the performance of your salsa programs.
Durability specifies the probability that an input's value will change.
The default is "low durability".
But when you set the value of an input, you can manually specify a higher durability,
typically Durability::HIGH.
Salsa tracks when tracked functions only consume values of high durability
and, if no high durability input has changed, it can skip traversing their
dependencies.
Typically "high durability" values are things like data read from the standard library or other inputs that aren't actively being edited by the end user.
The "red-green" algorithm
This page explains the basic Salsa incremental algorithm. The algorithm is called the "red-green" algorithm, which is where the name Salsa comes from.
Database revisions
The Salsa database always tracks a single revision. Each time you set an input, the revision is incremented. So we start in revision R1, but when a set method is called, we will go to R2, then R3, and so on. For each input, we also track the revision in which it was last changed.
Basic rule: when inputs change, re-execute!
When you invoke a tracked function, in addition to storing the value that was returned, we also track what other tracked functions it depends on, and the revisions when their value last changed. When you invoke the function again, if the database is in a new revision, then we check whether any of the inputs to this function have changed in that new revision. If not, we can just return our cached value. But if the inputs have changed (or may have changed), we will re-execute the function to find the most up-to-date answer.
Here is a simple example, where the parse_module function reads the module's text:
#![allow(unused)] fn main() { #[salsa::input] struct Module { #[returns(deref)] text: String, } #[salsa::tracked] fn parse_module(db: &dyn Db, module: Module) -> Ast { Ast::parse_text(module.text(db)) } }
If we invoke parse_module twice, but change the module text in between, then we will have to re-execute parse_module:
#![allow(unused)] fn main() { use salsa::Setter as _; let module = Module::new(&db, "fn foo() { }".to_string()); parse_module(&db, module); // executes // ...some time later... module .set_text(&mut db) .to("fn foo() { /* add a comment */ }".to_string()); parse_module(&db, module); // executes again! }
Backdating: sometimes we can be smarter
Often, though, tracked functions don't depend directly on the inputs. Instead, they'll depend on some other tracked function. For example, perhaps we have a type_check function that reads the AST:
#![allow(unused)] fn main() { #[salsa::tracked] fn type_check(db: &dyn Db, module: Module) { let ast = parse_module(db, module); ... } }
If the module text is changed, we saw that we have to re-execute parse_module, but there are many changes to the source text that still produce the same AST. For example, suppose we simply add a comment? In that case, if type_check is called again, we will:
- First re-execute
parse_module, since its input changed. - We will then compare the resulting AST. If it's the same as last time, we can backdate the result, meaning that we say that, even though the inputs changed, the output didn't.
Durability: an optimization
As an optimization, Salsa includes the concept of durability, which is the notion of how often some piece of tracked data changes.
For example, when compiling a Rust program, you might mark the inputs from crates.io as high durability inputs, since they are unlikely to change. The current workspace could be marked as low durability, since changes to it are happening all the time.
When you set the value of an input field, you can also set it with a given durability:
#![allow(unused)] fn main() { module .set_text(&mut db) .with_durability(salsa::Durability::HIGH) .to("fn foo() { }".to_string()); }
For each durability, we track the revision in which some input with that durability changed. If a tracked function depends (transitively) only on high durability inputs, and you change a low durability input, then we can very easily determine that the tracked function result is still valid, avoiding the need to traverse the input edges one by one.
Common patterns
This section documents patterns for using Salsa.
On-Demand (Lazy) Inputs
Salsa inputs work best if you can easily provide all of the inputs upfront. However sometimes the set of inputs is not known beforehand.
A typical example is reading files from disk. While it is possible to eagerly scan a particular directory and create an in-memory file tree as salsa input structs, a more straight-forward approach is to read the files lazily. That is, when a query requests the text of a file for the first time:
- Read the file from disk and cache it.
- Setup a file-system watcher for this path.
- Update the cached file when the watcher sends a change notification.
This is possible to achieve in salsa, by caching the inputs in your database structs and adding a method to the database trait to retrieve them out of this cache.
A complete, runnable file-watching example can be found in the lazy-input example.
The setup looks roughly like this:
#[salsa::input]
struct File {
#[returns(deref)]
path: PathBuf,
#[returns(deref)]
contents: String,
}
#[salsa::db]
trait Db: salsa::Database {
fn input(&self, path: PathBuf) -> Result<File>;
}
#[salsa::db]
#[derive(Clone)]
struct LazyInputDatabase {
storage: Storage<Self>,
logs: Arc<Mutex<Vec<String>>>,
files: DashMap<PathBuf, File>,
file_watcher: Arc<Mutex<Debouncer<RecommendedWatcher>>>,
}
impl LazyInputDatabase {
fn new(tx: Sender<DebounceEventResult>) -> Self {
let logs: Arc<Mutex<Vec<String>>> = Default::default();
Self {
storage: Storage::new(Some(Box::new({
let logs = logs.clone();
move |event| {
// don't log boring events
if let salsa::EventKind::WillExecute { .. } = event.kind {
logs.lock().unwrap().push(format!("{event:?}"));
}
}
}))),
logs,
files: DashMap::new(),
file_watcher: Arc::new(Mutex::new(
new_debouncer(Duration::from_secs(1), move |events| {
tx.send(events).unwrap()
})
.unwrap(),
)),
}
}
}
#[salsa::db]
impl salsa::Database for LazyInputDatabase {}
#[salsa::db]
impl Db for LazyInputDatabase {
fn input(&self, path: PathBuf) -> Result<File> {
let path = path
.canonicalize()
.wrap_err_with(|| format!("Failed to read {}", path.display()))?;
Ok(match self.files.entry(path.clone()) {
// If the file already exists in our cache then just return it.
Entry::Occupied(entry) => *entry.get(),
// If we haven't read this file yet set up the watch, read the
// contents, store it in the cache, and return it.
Entry::Vacant(entry) => {
// Set up the watch before reading the contents to try to avoid
// race conditions.
let watcher = &mut *self.file_watcher.lock().unwrap();
watcher
.watcher()
.watch(&path, RecursiveMode::NonRecursive)
.unwrap();
let contents = std::fs::read_to_string(&path)
.wrap_err_with(|| format!("Failed to read {}", path.display()))?;
*entry.insert(File::new(self, path, contents))
}
})
}
}
- We declare a method on the
Dbtrait that gives us aFileinput on-demand (it only requires a&dyn Dbnot a&mut dyn Db). - There should only be one input struct per file, so we implement that method using a cache (
DashMapis like aRwLock<HashMap>).
The driving code that's doing the top-level queries is then in charge of updating the file contents when a file-change notification arrives. It does this by updating the Salsa input in the same way that you would update any other input.
Here we implement a simple driving loop, that recompiles the code whenever a file changes. You can use the logs to check that only the queries that could have changed are re-evaluated.
fn main() -> Result<()> {
// Create the channel to receive file change events.
let (tx, rx) = unbounded();
let mut db = LazyInputDatabase::new(tx);
let initial_file_path = std::env::args_os()
.nth(1)
.ok_or_else(|| eyre!("Usage: ./lazy-input <input-file>"))?;
// Create the initial input using the input method so that changes to it
// will be watched like the other files.
let initial = db.input(initial_file_path.into())?;
loop {
// Compile the code starting at the provided input, this will read other
// needed files using the on-demand mechanism.
let sum = compile(&db, initial);
let diagnostics = compile::accumulated::<Diagnostic>(&db, initial);
if diagnostics.is_empty() {
println!("Sum is: {sum}");
} else {
for diagnostic in diagnostics {
println!("{}", diagnostic.0);
}
}
for log in db.logs.lock().unwrap().drain(..) {
eprintln!("{log}");
}
// Wait for file change events, the output can't change unless the
// inputs change.
for event in rx.recv()?.unwrap() {
let path = event.path.canonicalize().wrap_err_with(|| {
format!("Failed to canonicalize path {}", event.path.display())
})?;
let file = match db.files.get(&path) {
Some(file) => *file,
None => continue,
};
// `path` has changed, so read it and update the contents to match.
// This creates a new revision and causes the incremental algorithm
// to kick in, just like any other update to a salsa input.
let contents = std::fs::read_to_string(path)
.wrap_err_with(|| format!("Failed to read file {}", event.path.display()))?;
file.set_contents(&mut db).to(contents);
}
}
}
Tuning Salsa
Cache Eviction (LRU)
Salsa supports Least Recently Used (LRU) cache eviction for tracked functions. By default, memoized values are never evicted (unbounded cache). You can enable LRU eviction by specifying a capacity at compile time:
#![allow(unused)] fn main() { #[salsa::tracked(lru = 128)] fn parse(db: &dyn Db, input: SourceFile) -> Ast { // ... } }
With lru = 128, Salsa will keep at most 128 memoized values for this function.
When the cache exceeds this capacity, the least recently used values are evicted
at the start of each new revision.
Zero-Cost When Disabled
When no lru capacity is specified (the default), Salsa uses a no-op eviction
policy that is completely optimized away by the compiler. This means there is
zero runtime overhead for functions that don't need cache eviction.
Runtime Capacity Adjustment
For functions with LRU enabled, you can adjust the capacity at runtime:
#![allow(unused)] fn main() { #[salsa::tracked(lru = 128)] fn my_query(db: &dyn Db, input: MyInput) -> Output { // ... } // Later, adjust the capacity: my_query::set_lru_capacity(&mut db, 256); }
Note: The set_lru_capacity method is only generated for functions that have
an lru attribute. Functions without LRU enabled do not have this method.
Memory Management
LRU evicts memoized values, not query keys or dependency metadata. Salsa also reclaims stale tracked outputs and unused low-durability interned values. Input identities remain until the database is dropped.
Intern Queries
Intern queries can make key lookup cheaper, save memory, and
avoid the need for Arc.
Interning is especially useful for queries that involve nested, tree-like data structures.
See:
- The
calcexample, which uses interning.
Cancellation
Queries that are no longer needed due to concurrent writes or changes in dependencies are cancelled by Salsa. Each access of an intermediate query is a potential cancellation point. Cancellation is implemented via panicking, and Salsa internals are intended to be panic-safe.
If you have a query that contains a long loop which does not execute any intermediate queries,
salsa won't be able to cancel it automatically. You may wish to check for cancellation yourself
by invoking db.unwind_if_revision_cancelled().
For more details on cancellation, see the tests for cancellation behavior in the Salsa repo.
Cycle handling
By default, when Salsa detects a cycle in the computation graph, Salsa will panic with a message naming the "cycle head"; this is the query that was called while it was also on the active query stack, creating a cycle.
Salsa also supports recovering from query cycles via fixed-point iteration, using cycle_fn and cycle_initial or explicitly defining a fallback value with cycle_result.
Fixed-Point Iteration
Fixed-point iteration is only usable if the queries which may be involved in a cycle are monotone and operate on a value domain which is a partial order with fixed height. Effectively, this means that the queries' output must always be "larger" than its input, and there must be some "maximum" or "top" value. This ensures that fixed-point iteration will converge to a value. (A typical case would be queries operating on types, which form a partial order with a "top" type.)
In order to support fixed-point iteration for a query, provide the cycle_fn and cycle_initial arguments to salsa::tracked:
#![allow(unused)] fn main() { #[salsa::tracked(cycle_fn=cycle_fn, cycle_initial=cycle_initial)] fn query(db: &dyn salsa::Database) -> u32 { // ... } fn cycle_fn(_db: &dyn salsa::Database, _cycle: &salsa::Cycle, _last_provisional_value: &u32, value: u32) -> u32 { value } fn cycle_initial(_db: &dyn salsa::Database, _id: salsa::Id) -> u32 { 0 } }
The cycle_fn is optional. The default implementation always returns the computed value.
If query becomes the head of a cycle (that is, query is executing and on the active query stack, it calls query2, query2 calls query3, and query3 calls query again -- there could be any number of queries involved in the cycle), the cycle_initial will be called to generate an "initial" value for query in the fixed-point computation. (The initial value should usually be the "bottom" value in the partial order.) All queries in the cycle will compute a provisional result based on this initial value for the cycle head. That is, query3 will compute a provisional result using the initial value for query, query2 will compute a provisional result using this provisional value for query3. When cycle2 returns its provisional result back to cycle, cycle will observe that it has received a provisional result from its own cycle, and will call the cycle_fn (with a salsa::Cycle describing the current cycle, the last provisional value, and the newly computed value). The cycle_fn can return the value parameter to continue iterating with the computed value, or return a different value (a fallback value) to continue iteration with that value instead.
The cycle will iterate until it converges: that is, until the value returned by cycle_fn equals the value from the previous iteration.
If a cycle iterates more than 200 times, Salsa will panic rather than iterate forever.
All potential cycle heads must set cycle_initial
Consider a two-query cycle where query_a calls query_b, and query_b calls query_a. If query_a is called first, then it will become the "cycle head", but if query_b is called first, then query_b will be the cycle head. In order for a cycle to use fixed-point iteration instead of panicking, the cycle head must set cycle_initial. This means that in order to be robust against varying query execution order, both query_a and query_b must set cycle_initial.
Ensuring convergence
Fixed-point iteration is a powerful tool, but is also easy to misuse, potentially resulting in infinite iteration. To avoid this, ensure that all queries participating in fixpoint iteration are deterministic and monotone.
To guarantee convergence, you can leverage the last_provisional_value (3rd parameter) received by cycle_fn.
When the cycle_fn receives a newly computed value, you can implement a strategy that references the last provisional value to "join" values or "widen" it and return a fallback value. This ensures monotonicity of the calculation and suppresses infinite oscillation of values between cycles.
Also, in fixed-point iteration, it is advantageous to be able to identify which cycle head seeded a value. By embedding the salsa::Id passed to cycle_initial in the initial value as a "cycle marker", the recovery function can compare it with Cycle::id() and detect self-originated recursion.
Calling Salsa queries from within cycle_fn or cycle_initial
It is permitted to call other Salsa queries from within the cycle_fn and cycle_initial functions. However, if these functions re-enter the same cycle, this can lead to unpredictable results. Take care which queries are called from within cycle-recovery functions, and avoid triggering further cycles.
Fallback Values
You can use cycle_result to specify a fallback value if Salsa detects a cycle. Queries with cycle_result always run to completion, but the resulting value will be replaced with the fallback value if a cycle is encountered.
#![allow(unused)] fn main() { #[salsa::tracked(cycle_result=cycle_result)] fn query(db: &dyn salsa::Database) -> u32 { // ... } fn cycle_result(_db: &dyn salsa::Database, _id: salsa::Id) -> u32 { 42 } }
Unlike fixpoint iteration, queries attributed with cycle_result also use their fallback value if
they participate in a cycle. This is to ensure the query result doesn't depend on the query execution order (details).
How Salsa works
Video available
To get the most complete introduction to Salsa's inner workings, check out the "How Salsa Works" video. If you'd like a deeper dive, the "Salsa in more depth" video digs into the details of the incremental algorithm.
If you're in China, watch videos on "How Salsa Works", "Salsa In More Depth".
Key idea
The key idea of salsa is that you define your program as a set of
queries. Every query is used like a function K -> V that maps from
some key of type K to a value of type V. Queries come in two basic
varieties:
- Inputs: the base inputs to your system. You can change these whenever you like.
- Functions: pure functions (no side effects) that transform your inputs into other values. The results of queries are memoized to avoid recomputing them a lot. When you make changes to the inputs, we'll figure out (fairly intelligently) when we can re-use these memoized values and when we have to recompute them.
How to use Salsa in three easy steps
Using Salsa is as easy as 1, 2, 3...
- Define the Salsa structs you will need with
#[salsa::input],#[salsa::tracked], or#[salsa::interned]. - Define your memoized query functions with
#[salsa::tracked]. - Define the database, which contains a
salsa::Storage<Self>field and may also contain anything else that your code needs.
To see an example of this in action, check out the calc example.
Digging into the plumbing
Check out the plumbing chapter to see a deeper explanation of the code that Salsa generates and how it connects to the Salsa library.
Videos
There is currently one video available on the newest version of Salsa:
- Salsa Architecture Walkthrough, which covers many aspects of the redesigned architecture.
There are also two videos on the older version Salsa, but they are rather outdated:
- How Salsa Works, which gives a high-level introduction to the key concepts involved and shows how to use Salsa;
- Salsa In More Depth, which digs into the incremental algorithm and explains -- at a high-level -- how Salsa is implemented.
If you're in China, watch videos on How Salsa Works, Salsa In More Depth.
Plumbing
This chapter documents the code that salsa generates and its "inner workings". We refer to this as the "plumbing".
Overview
The plumbing section is broken up into chapters:
- The database and runtime covers the data structures that are used at runtime to coordinate workers, trigger cancellation, track which functions are active and what dependencies they have accrued, and so forth.
- The query operations chapter describes how the major operations on function ingredients work. This text was written for an older version of salsa but the logic is the same:
- The maybe changed after operation determines when a memoized value for a tracked function is out of date.
- The fetch operation computes the most recent value.
- The derived queries flowchart depicts the logic in flowchart form.
- The cycle handling handling chapter describes what happens when cycles occur.
- The terminology section describes various words that appear throughout.
Database and runtime
A salsa database struct is declared by the user with the #[salsa::db] annotation.
It contains all the data that the program needs to execute:
#[salsa::db]
struct MyDatabase {
storage: Storage<Self>,
maybe_other_fields: u32,
}
This data is divided into two categories:
- Salsa-governed storage, contained in the
Storage<Self>field. This data is mandatory. - Other fields (like
maybe_other_fields) defined by the user. This can be anything. This allows for you to give access to special resources or whatever.
Parallel handles
When used across parallel threads, the database type defined by the user must implement Clone.
Each clone can be used by the parallel threads.
The Storage type shares the ingredients, runtime, and memoized values between clones.
Each clone has its own active query stack.
The Storage struct
The salsa Storage struct contains all the data that salsa itself will use and work with.
There are two key parts:
- The shared
Zalsadata, which contains the ingredients, runtime, memoized values, and synchronization information. Some operations, like mutating an input, require an&muthandle to this data. This is obtained by usingArc::get_mut, which is only possible once all clones and parallel threads have ceased executing. - The per-handle
ZalsaLocaldata, which is specific to a particular database instance. It contains the data for a single active thread, including the active query stack.
Incrementing the revision counter
Salsa's general model is that there is a database and, potentially, multiple cloned handles.
Each clone owns another handle on the Arc in Storage that stores the ingredients.
Whenever the user attempts to do an &mut operation, such as modifying an input field, Salsa must
first cancel any parallel handles and wait for those threads to finish.
Once the other handles have completed, Salsa can use Arc::get_mut to get an &mut reference to the shared data.
This allows Salsa to get &mut access without unsafe code and
guarantees that it has successfully cancelled the other worker threads
(or gotten itself into a deadlock).
The key point is that Salsa cancels other workers before proceeding:
#![allow(unused)] fn main() { /// Sets cancellation flag and blocks until all other workers with access /// to this storage have completed. /// /// This could deadlock if there is a single worker with two handles to the /// same database! /// /// Needs to be paired with a call to `reset_cancellation_flag`. fn cancel_others(&mut self) -> &mut Zalsa { debug_assert!( self.zalsa_local .try_with_query_stack(|stack| stack.is_empty()) == Some(true), "attempted to cancel within query computation, this is a deadlock" ); { let _cancellation_flag = CancellationFlagGuard::new(&self.handle.zalsa_impl); self.handle .zalsa_impl .event(&|| Event::new(EventKind::DidSetCancellationFlag)); let mut clones = self.handle.coordinate.clones.lock(); while *clones != 1 { clones = self.handle.coordinate.cvar.wait(clones); } } // The ref count on the `Arc` should now be 1 let zalsa = Arc::get_mut(&mut self.handle.zalsa_impl).unwrap(); // Increment the cancellation count only after cancelled workers have dropped their // handles. Otherwise, a worker unwinding from cancellation could insert a provisional // memo with the new cancellation count. let overflow = zalsa.runtime_mut().bump_cancellation_count(); if overflow { zalsa.new_revision(); } zalsa } }
The Salsa runtime
The salsa runtime offers helper methods that are accessed by the ingredients. It tracks the current revision and information about when values with low or high durability last changed. Its cross-thread dependency graph is used for resolving cycles.
Basically, the ingredient structures store the "data at rest" -- like memoized values -- and things that are "per ingredient".
The per-handle ZalsaLocal stores the "active, in-progress" data, such as which queries are on the stack and the dependencies accessed by the currently active query.
It also contains methods for adding those dependencies, such as report_tracked_read.
The 'db lifetime
Tracked and interned structs are both declared with a 'db lifetime.
This lifetime is linked to the db: &DB reference used to create them.
The 'db lifetime has several implications:
- It ensures that the user does not create a new salsa revision while a tracked/interned struct is in active use. Creating a new salsa revision requires modifying an input which requires an
&mut DBreference, therefore it cannot occur during'db.- The struct may not even exist in the new salsa revision so allowing access would be confusing.
- It permits the structs to be implemented using a pointer rather than a
salsa::Id, which in turn means more efficient field access (no read locks required).
This section discusses the unsafe code used for pointer-based access along with the reasoning behind it. To be concrete, we'll focus on tracked structs -- interned structs are very similar.
A note on UB
When we say in this page "users cannot do X", we mean without Undefined Behavior (e.g., by transmuting integers around etc).
Proof obligations
Here is a typical sequence of operations for a tracked struct along with the user operations that will require us to prove unsafe assertions:
- A tracked function
fexecutes in revision R0 and creates a tracked struct with#[id]fieldsKfor the first time.Kwill be stored in the interning hashmap and mapped to a fresh identifierid.- The identifier
idwill be used as the key in theStructMapand point to a freshly created allocationalloc : Alloc. - A
ts: TS<'db>is created from the raw pointerallocand returned to the user.
- The value of the field
fieldis accessed on the tracked struct instancetsby invoking the methodts.field(db)- Unsafe: This accesses the raw pointer to
alloc.* A new revision R1 begins.
- Unsafe: This accesses the raw pointer to
- The tracked function
fdoes not re-execute in R1. - The value of the field
fieldis accessed on the tracked struct instancetsby invoking the methodts.field(db)- Unsafe: This accesses the raw pointer to
alloc.* A new revision R2 begins.
- Unsafe: This accesses the raw pointer to
- The tracked function
fdoes reexecute in R2 and it again creates a tracked struct with keyKand with (Some) distinct field values.- The fields for
tsare updated.
- The fields for
- The value of the field
fieldis accessed on the tracked struct instancetsby invoking the methodts.field(db)- Unsafe: This accesses the raw pointer to
alloc.
- Unsafe: This accesses the raw pointer to
- A new revision R3 begins.
- When
fexecutes this time it does NOT create a tracked struct with keyK. The tracked structtsis placed in the "to be deleted" list. - A new revision R4 begins:
- The allocation
allocis freed.
- The allocation
As noted in the list, the core "unsafe" operation that users can perform is to access the fields of a tracked struct.
Tracked structs store a raw pointer to the alloc, owned by the ingredient, that contains their field data.
Accessing the fields of a tracked struct returns a &-reference to fields stored in that alloc, which means we must ensure Rust's two core constraints are satisfied for the lifetime of that reference:
- The allocation
allocwill not be freed (i.e., not be dropped) - The contents of the fields will not be mutated
As the sequence above illustrates, we have to show that those two constraints are true in a variety of circumstances:
- newly created tracked structs
- tracked structs that were created in prior revisions and re-validated in this revision
- tracked structs whose fields were updated in this revision
- tracked structs that were not created in this revision
Definitions
For every tracked struct ts we say that it has a defining query f(..).
This refers to a particular invocation of the tracked function f with a particular set of arguments ...
This defining query is unique within a revision, meaning that f executes at most once with that same set of arguments.
We say that a query has executed in a revision R if its function body was executed. When this occurs, all tracked structs defined (created) by that query will be recorded along with the query's result.
We say that a query has been validated in a revision R if the salsa system determined that its inputs did not change and so skipped executing it. This also triggers the tracked structs defined by that query to be considered validated (in particular, we execute a function on them which updates some internal fields, as described below).
When we talk about ts, we mean
Theorem: At the start of a new revision, all references to ts are within salsa's database
After ts is deleted, there may be other memoized values still reference ts, but they must have a red input query.
Is this true even if there are user bugs like non-deterministic functions?
Argument: yes, because of non-forgery, those memoized values could not be accessed.
How did those memoized values obtain the TS<'db> value in the first place?
It must have come from a function argument (XX: what about thread-local state).
Therefore, to access the value, they would have to provide those function arguments again.
But how did they get them?
Potential holes:
- Thread-local APIs that let you thread
'dbvalues down in an "invisible" way, so that you can return them without them showing up in your arguments -- e.g. a tracked function() -> S<'db>that obtains its value from thread-local state.- We might be able to sanity check against this with enough effort by defining some traits that guarantee that every lifetime tagged thing in your result could have come from one of your arguments, but I don't think we can prove it altogether. We either have to tell users "don't do that" or we need to have some kind of dynamic check, e.g. with a kind of versioned pointer. Note that it does require unsafe code at present but only because of the limits of our existing APIs.
- Alternatively we can do a better job cleaning up deleted stuff. This we could do.
- what about weird
Eqimplementations and the like? Do we have to make those unsafe?
Theorem: To access a tracked struct ts in revision R, the defining query f(..) must have either executed or been validated in the revision R.
This is the core bit of reasoning underlying most of what follows.
The idea is that users cannot "forge" a tracked struct instance ts.
They must have gotten it through salsa's internal mechanisms.
This is important because salsa will provide &-references to fields within that remain valid during a revision.
But at the start of a new revision salsa may opt to modify those fields or even free the allocation.
This is safe because users cannot have references to ts at the start of a new revision.
Lemma
We will prove it by proceeding through the revisions in the life cycle above (this can be considered a proof by induction).
Before ts is first created in R0
Users must have originally obtained ts: TS<'db> by invoking TS::new(&db, ...).
This is because creating an instance of TS requires providing a NonNull<salsa::tracked_struct::ValueStruct> pointer
to an unsafe function whose contract requires the pointer's validity.
FIXME: This is not strictly true, I think the constructor is just a private tuple ctor, we should fix that.
During R0
Inductive case: Consider some revision R
We start by showing some circumstances that cannot occur:
- accessing the field of a tracked struct
tsthat was never created - accessing the field of a tracked struct
tsafter it is freed
Lemma (no forgery): Users cannot forge a tracked struct
The first observation is that users cannot "forge" an instance of a tracked struct ts.
They are required to produce a pointer to an Alloc.
This implies that every tracked struct ts originated in the ingredient.
The same is not true for input structs, for example, because they are created from integer identifiers and users could just make those up.
Lemma (within one rev): Users cannot hold a tracked struct ts across revisions
The lifetime 'db of the tracked struct ts: TS<'db> is created from a db: &'db dyn Db handle.
Beginning a new revision requires an &mut reference.
Therefore so long as users are actively using the value ts the database cannot start a new revision.
Check: What if users had two databases and invoked internal methods? Maybe they could then. We may have to add some assertions.
Theorem: In order to get a tracked struct ts in revision R0, the tracked fn f that creates it must either execute or be validated first
The two points above combine to
Creating new values
Each new value is stored in a salsa::alloc::Alloc created by StructMap::insert.
Alloc is a variant of the standard Rust Box that carries no uniqueness implications.
This means that every tracked struct has its own allocation.
This allocation is owned by the tracked struct ingredient
and thus stays live until the tracked struct ingredient is dropped
or until it is removed (see later for safety conditions around removal).
The user type uses a raw pointer
The #[salsa::tracked] macro creates a user-exposed struct that looks roughly like this:
#![allow(unused)] fn main() { // This struct is a wrapper around the actual fields that adds // some revision metadata. You can think of it as a newtype'd // version of the fields of the tracked struct. use salsa::tracked_struct::ValueStruct; struct MyTrackedStruct<'db> { value: *const ValueStruct<..>, phantom: PhantomData<&'db ValueStruct<...>> } }
Key observations:
- The actual pointer to the
ValueStructused at runtime is not a Rust reference but a raw pointer. This is needed for stacked borrows. - A
PhantomDatais used to keep the'dblifetime alive.
The reason we use a raw pointer in the struct is because instances of this struct will outlive the 'db lifetime. Consider this example:
#![allow(unused)] fn main() { let mut db = MyDatabase::default(); let input = MyInput::new(&db, ...); // Revision 1: let result1 = tracked_fn(&db, input); // Revision 2: input.set_field(&mut db).to(...); let result2 = tracked_fn(&db, input); }
Tracked structs created by tracked_fn during Revision 1
may be reused during Revision 2, but the original &db reference
used to create them has expired.
If we stored a true Rust reference, that would be a violation of
the stacked borrows rules.
Instead, we store a raw pointer and, whenever users invoke the accessor methods for particular fields, we create a new reference to the contents:
#![allow(unused)] fn main() { impl<'db> MyTrackedStruct<'db> { fn field(self, db: &'db dyn DB) -> &'db FieldType { ... } } }
This reference is linked to db and remains valid so long as the
The 'db lifetime at rest
Salsa stores tracked and interned fields and memoized query results after the particular &'db DB
borrow that produced them has ended. Internally, Salsa erases that lifetime while the value is in
storage and restores the current database lifetime when the value is accessed again. The unsafe
ingredient Configuration traits guarantee that their Output<'db> or Fields<'db> associated
types can be stored with 'db replaced by 'static and later restored. The generated unsafe
implementations are justified by checking that every lifetime-dependent value implements
SalsaValue. This is the boundary that makes rebranding sound: an older value must remain safe to
retain and use in a later revision, including through safe operations such as PartialEq.
#[derive(salsa::SalsaValue)] checks the guarantee structurally. A field whose type is
unconditionally 'static is accepted directly. A field that borrows for 'db, or is otherwise
not 'static, must implement SalsaValue. A direct database-lifetime reference such as &'db T
does not implement the trait because its referent may be changed or freed in a later revision.
Salsa handles do implement it because access to their data goes back through the current database
state.
The derive supports types with at most one lifetime parameter, as well as type and const
parameters. Generated implementations require generic field types to implement SalsaValue;
types that need different bounds can provide a manual implementation instead.
If a field is known to satisfy the guarantee but its type cannot implement SalsaValue,
#[salsa_value(unsafe(prove_safe_to_retain_manually))] skips the structural check for that field.
The unsafe wrapper makes the proof obligation explicit: the author must ensure the field remains
valid when Salsa retains it across revisions and rebinds its database lifetime. This manual
assertion is supported by derive(SalsaValue) and directly on tracked and interned struct fields.
Updating tracked struct fields across revisions
The XX
Safety lemmas
These lemmas are used to justify the safety of the system.
Using MyTracked<'db> within some revision R always "happens after' a call to MyTracked::new
Whenever a tracked struct instance TS<'db> is created for the first time in revision R1,
the result is a fresh allocation and hence there cannot be any
pre-existing aliases of that struct.
TS<'db> will at that time be stored into the salsa database.
In later revisions, we assert that
&'db T references are never stored in the database
We maintain the invariant that, in any later revision R2,
However in some later revision R2, how
Ways this could go wrong and how we prevent them
Storing an &'db T into a field
Freeing the memory while a tracked struct remains live
Aliases of a tracked struct
Tracked structs
Tracked structs are stored in a special way to reduce their costs.
Tracked structs are created via a new operation.
The tracked struct and tracked field ingredients
For a single tracked struct we create multiple ingredients.
The tracked struct ingredient is the ingredient created first.
It creates new instances of the struct and assigns their ids.
The corresponding ValueStruct data is stored in Salsa's paged table.
For each #[tracked] field, we create a tracked field ingredient that moderates access
to a particular field. All of these ingredients use the same paged table
to access the ValueStruct instance for a given id. The ValueStruct
contains both the field values but also the revisions when they last changed value.
Each tracked struct has an id
This begins by creating a database-local salsa::Id for the tracked struct.
The ID contains a table index and a generation used when slots are reused.
Its identity is derived from a combination of
- the currently executing query;
- a u64 hash of the fields not marked
#[tracked]; - a disambiguator that makes this hash unique within the current query. i.e., when a query starts executing, it creates an empty map, and the first time a tracked struct with a given hash is created, it gets disambiguator 0. The next one will be given 1, etc.
Each tracked struct has a ValueStruct storing its data
The struct and field ingredients use the paged table to find the value struct for a given id:
pub struct Value<C>
where
C: Configuration,
{
/// The revision when this tracked struct was last updated.
/// This field also acts as a kind of "lock" over the `value` field. Once it is equal
/// to `Some(current_revision)`, the fields are locked and
/// cannot change further. This makes it safe to give out `&`-references
/// so long as they do not live longer than the current revision
/// (which is assured by tying their lifetime to the lifetime of an `&`-ref
/// to the database).
///
/// The struct is updated from an older revision `R0` to the current revision `R1`
/// when the struct is first accessed in `R1`, whether that be because the original
/// query re-created the struct (i.e., by user calling `Struct::new`) or because
/// the struct was read from. (Structs may not be recreated in the new revision if
/// the inputs to the query have not changed.)
///
/// When re-creating the struct, the field is temporarily set to `None`.
/// This is signal that there is an active `&mut` modifying the other fields:
/// even reading from those fields in that situation would create UB.
/// This `None` value should never be observable by users unless they have
/// leaked a reference across threads somehow.
updated_at: OptionalAtomicRevision,
/// The durability minimum durability of all inputs consumed
/// by the creator query prior to creating this tracked struct.
/// If any of those inputs changes, then the creator query may
/// create this struct with different values.
durability: Durability,
/// The revision information for each field: when did this field last change.
/// When tracked structs are re-created, this revision may be updated to the
/// current revision if the value is different.
revisions: C::Revisions,
/// Fields of this tracked struct. They can change across revisions,
/// but they do not change within a particular revision.
///
/// TODO: Consider whether we need a more explicit aliasing barrier or whether
/// this should be restructured (e.g., with a nested struct for `fields` + `memos`)
/// to make the aliasing guarantees more obvious. See PR #741 for prior discussion.
fields: C::Fields<'static>,
/// Memo table storing the results of query functions etc.
/*unsafe */
memos: MemoTable,
}
The value struct stores the values of the fields but also the revisions when that field last changed. Each time the struct is recreated in a new revision, the old and new values for its fields are compared and changed field revisions are updated.
The macro generates the tracked struct Configuration
The "configuration" for a tracked struct defines not only the types of the fields, but also various important operations such as extracting the hashable id fields and updating the "revisions" to track when a field last changed:
/// Trait that defines the key properties of a tracked struct.
///
/// Implemented by the `#[salsa::tracked]` macro when applied
/// to a struct.
///
/// # Safety
///
/// For every lifetime `'db`, `Fields<'db>` must be safe for Salsa to retain
/// as `Fields<'static>` and later expose with the lifetime of a database
/// borrow. Both types must have identical layouts and validity invariants.
pub unsafe trait Configuration: Sized + 'static {
const LOCATION: crate::ingredient::Location;
/// The debug name of the tracked struct.
const DEBUG_NAME: &'static str;
/// The debug names of any tracked fields.
const TRACKED_FIELD_NAMES: &'static [&'static str];
/// The relative indices of any tracked fields.
const TRACKED_FIELD_INDICES: &'static [usize];
/// Whether this struct should be persisted with the database.
const PERSIST: bool;
/// A (possibly empty) tuple of the fields for this struct.
type Fields<'db>: Send + Sync;
/// A array of [`AtomicRevision`][] values, one per each of the tracked value fields.
/// When a struct is re-recreated in a new revision, the corresponding
/// entries for each field are updated to the new revision if their
/// values have changed (or if the field is marked as `#[no_eq]`).
#[cfg(feature = "persistence")]
type Revisions: Send
+ Sync
+ Index<usize, Output = AtomicRevision>
+ plumbing::serde::Serialize
+ for<'de> plumbing::serde::Deserialize<'de>;
#[cfg(not(feature = "persistence"))]
type Revisions: Send + Sync + Index<usize, Output = AtomicRevision>;
type Struct<'db>: Copy + FromId + AsId;
fn untracked_fields(fields: &Self::Fields<'_>) -> impl Hash;
/// Create a new value revision array where each element is set to `current_revision`.
fn new_revisions(current_revision: Revision) -> Self::Revisions;
/// Update the field data and, if the value has changed,
/// the appropriate entry in the `revisions` array (tracked fields only).
///
/// Returns `true` if any identity field changed and the struct should be
/// considered re-created.
///
fn update_fields<'db>(
current_revision: Revision,
revisions: &Self::Revisions,
old_fields: &mut Self::Fields<'db>,
new_fields: Self::Fields<'db>,
) -> bool;
/// Returns the size of any heap allocations in the output value, in bytes.
fn heap_size(_value: &Self::Fields<'_>) -> Option<usize> {
None
}
/// Serialize the fields using `serde`.
///
/// Panics if the value is not persistable, i.e. `Configuration::PERSIST` is `false`.
fn serialize<S>(value: &Self::Fields<'_>, serializer: S) -> Result<S::Ok, S::Error>
where
S: plumbing::serde::Serializer;
/// Deserialize the fields using `serde`.
///
/// Panics if the value is not persistable, i.e. `Configuration::PERSIST` is `false`.
fn deserialize<'de, D>(deserializer: D) -> Result<Self::Fields<'static>, D::Error>
where
D: plumbing::serde::Deserializer<'de>;
}
Query operations
The most important basic operations that all queries support are:
- maybe changed after: Returns true if the value of the query (for the given key) may have changed since the given revision.
- Fetch: Returns the up-to-date value for the given K (or an error in the case of an "unrecovered" cycle).
Maybe changed after
The maybe_changed_after operation computes whether a query's value may have changed after the given revision. In other words, Q.maybe_changed_after(R) is true if the value of the query Q may have changed in the revisions (R+1)..R_now, where R_now is the current revision. Note that it doesn't make sense to ask maybe_changed_after(R_now).
Input queries
Input queries are set explicitly by the user. maybe_changed_after can therefore just check when the value was last set and compare.
Interned queries
Derived queries
The logic for derived queries is more complex. We summarize the high-level ideas here, but you may find the flowchart useful to dig deeper. The terminology section may also be useful; in some cases, we link to that section on the first usage of a word.
- If an existing memo is found, then we check if the memo was verified in the current revision. If so, we can compare its changed at revision and return true or false appropriately.
- Otherwise, we must check whether dependencies have been modified:
- Let R be the revision in which the memo was last verified; we wish to know if any of the dependencies have changed since revision R.
- First, we check the durability. For each memo, we track the minimum durability of the memo's dependencies. If the memo has durability D, and there have been no changes to an input with durability D since the last time the memo was verified, then we can consider the memo verified without any further work.
- If the durability check is not sufficient, then we must check the dependencies individually. For this, we iterate over each dependency D and invoke the maybe changed after operation to check whether D has changed since the revision R.
- If no dependency was modified:
- We can mark the memo as verified and use its changed at revision to return true or false.
- Assuming dependencies have been modified:
- Then we execute the user's query function (same as in fetch), which potentially backdates the resulting value.
- Compare the changed at revision in the resulting memo and return true or false.
Fetch
The fetch operation computes the value of a query. It prefers to reuse memoized values when it can.
Input queries
Input queries simply load the result from the table.
Interned queries
Interned queries map the input into a hashmap to find an existing integer. If none is present, a new value is created.
Derived queries
The logic for derived queries is more complex. We summarize the high-level ideas here, but you may find the flowchart useful to dig deeper. The terminology section may also be useful; in some cases, we link to that section on the first usage of a word.
- If an existing memo is found, then we check if the memo was verified in the current revision. If so, we can directly return the memoized value.
- Otherwise, if the memo contains a memoized value, we must check whether dependencies have been modified:
- Let R be the revision in which the memo was last verified; we wish to know if any of the dependencies have changed since revision R.
- First, we check the durability. For each memo, we track the minimum durability of the memo's dependencies. If the memo has durability D, and there have been no changes to an input with durability D since the last time the memo was verified, then we can consider the memo verified without any further work.
- If the durability check is not sufficient, then we must check the dependencies individually. For this, we iterate over each dependency D and invoke the maybe changed after operation to check whether D has changed since the revision R.
- If no dependency was modified:
- We can mark the memo as verified and return its memoized value.
- Assuming dependencies have been modified or the memo does not contain a memoized value:
- Then we execute the user's query function.
- Next, we compute the revision in which the memoized value last changed:
- Backdate: If there was a previous memoized value, and the new value is equal to that old value, then we can backdate the memo, which means to use the 'changed at' revision from before.
- Thanks to backdating, it is possible for a dependency of the query to have changed in some revision R1 but for the output of the query to have changed in some revision R2 where R2 predates R1.
- Otherwise, we use the current revision.
- Backdate: If there was a previous memoized value, and the new value is equal to that old value, then we can backdate the memo, which means to use the 'changed at' revision from before.
- Construct a memo for the new value and return it.
Derived queries flowchart
Derived queries are by far the most complex. This flowchart documents the flow of the maybe changed after and fetch operations. This flowchart can be edited on draw.io:
Cycles
Cross-thread blocking
The interface for blocking across threads now works as follows:
- When one thread
T1wishes to block on a queryQbeing executed by another threadT2, it invokesRuntime::block. This checks for cycles and, assuming no cycle is detected, returnsBlockResult::Running. CallingRunning::block_onthen blocksT1untilT2has completed withQ. At that point,T1reawakens. However, we don't know the result of executingQ, soT1now has to "retry". Typically, this will result in successfully reading the cached value. - While
T1is blocking, its active query stack remains with the database handle.
Cycle detection
When a thread T1 attempts to execute a query Q, it will try to load the value for Q from the memoization tables. If it finds that Q is currently being computed, this indicates a potential cycle. T1 will then try to block on the query Q:
- If
Qis also being computed byT1, then there is a cycle. - Otherwise, if
Qis being computed by some other threadT2, we have to check whetherT2is (transitively) blocked onT1. If so, there is a cycle.
These two cases are handled internally by Runtime::block. Detecting the intra-thread cycle case is easy; to detect cross-thread cycles, the runtime maintains a dependency DAG between threads (identified by std::thread::ThreadId). Before adding an edge T1 -> T2 (i.e., T1 is blocked waiting for T2) into the DAG, it checks whether a path exists from T2 to T1. If so, Salsa invokes the tracked function's configured cycle strategy.
Terminology
Backdate
Backdating is when we mark a value that was computed in revision R as having last changed in some earlier revision. This is done when we have an older memo M and we can compare the two values to see that, while the dependencies to M may have changed, the result of the query function did not.
Changed at
The changed at revision for a memo is the revision in which that memo's value last changed. Typically, this is the same as the revision in which the query function was last executed, but it may be an earlier revision if the memo was backdated.
Dependency
A dependency of a query Q is some other query Q1 that was invoked as part of computing the value for Q (typically, invoking by Q's query function).
Derived query
A derived query is a query whose value is defined by the result of a user-provided query function. That function is executed to get the result of the query. Unlike input queries, the result of a derived queries can always be recomputed whenever needed simply by re-executing the function.
Durability
Durability is an optimization that we use to avoid checking the dependencies of a query individually.
Input query
An input query is a query whose value is explicitly set by the user. When that value is set, a durability can also be provided.
Ingredient
An ingredient is an individual piece of storage used to create a salsa item
LRU
The lru option on #[salsa::tracked] fixes the maximum number of values retained for that function. Functions with lru also generate function_name::set_lru_capacity(&mut db, capacity) to adjust the capacity. Salsa drops values from older memos to conserve memory, but retains their dependency information so that it can still compute whether values may have changed. See cache eviction for examples.
Memo
A memo stores information about the last time that a query function for some query Q was executed:
- Typically, it contains the value that was returned from that function, so that we don't have to execute it again.
- However, this is not always true: some queries don't cache their result values, and values can also be dropped as a result of LRU collection. In those cases, the memo just stores dependency information, which can still be useful to determine if other queries that have Q as a dependency may have changed.
- The revision in which the memo last verified.
- The changed at revision in which the memo's value last changed. (Note that it may be backdated.)
- The minimum durability of the memo's dependencies.
- The complete set of dependencies, if available, or a marker that the memo has an untracked dependency.
Query
Query function
The query function is the user-provided function that we execute to compute the value of a derived query. Salsa assumed that all query functions are a 'pure' function of their dependencies unless the user reports an untracked read. Salsa always assumes that functions have no important side-effects (i.e., that they don't send messages over the network whose results you wish to observe) and thus that it doesn't have to re-execute functions unless it needs their return value.
Revision
A revision is a monotonically increasing integer that we use to track the "version" of the database. Each time the value of an input query is modified, we create a new revision.
Salsa item
A salsa item is something that is decorated with a #[salsa::foo] macro, like a tracked function or struct.
Salsa struct
A salsa struct is a struct decorated with one of the salsa macros:
#[salsa::tracked]#[salsa::input]#[salsa::interned]
See the salsa overview for more details.
Untracked dependency
An untracked dependency is an indication that the result of a derived query depends on something not visible to the salsa database. Untracked dependencies are created by invoking Database::report_untracked_read. When an untracked dependency is present, derived queries are always re-executed in a later revision (see the description of the fetch operation for more details).
Verified
A memo is verified in a revision R if we have checked that its value is still up-to-date (i.e., if we were to reexecute the query function, we are guaranteed to get the same result). Each memo tracks the revision in which it was last verified to avoid repeatedly checking whether dependencies have changed during the fetch and maybe changed after operations.
Meta: about the book itself
Linking policy
We try to avoid links that easily become fragile.
Do:
- Link to
docs.rstypes to document the public API, but modify the link to uselatestas the version. - Link to modules in the source code.
- Create "named anchors" and embed source code directly.
Don't:
- Link to direct lines on github, even within a specific commit, unless you are trying to reference a historical piece of code ("how things were at the time").