Hey! These docs are for version 2.8, which is no longer officially supported. Click here for the latest version, 2.17!

## Rules

Plugin logic is defined in _rules_: [pure functions](🔗) that map a set of statically-declared input types to a statically-declared output type.

Each rule is an `async` Python function annotated with the decorator `@rule`, which takes any number of parameters (including zero) and returns a value of one specific type. Rules must be annotated with [type hints](🔗).

For example, this rule maps `(int) -> str`.

Although any Python type, including builtin types like `int`, can be a parameter or return type of a rule, in almost all cases rules will deal with values of custom Python classes.

Generally, rules correspond to a step in your build process. For example, when adding a new linter, you may have a rule that maps `(Target, Shellcheck) -> LintResult`:

You do not call a rule like you would a normal function. In the above examples, you would not say `int_to_str(26)` or `run_shellcheck(tgt, shellcheck)`. Instead, the Pants engine determines when rules are used and calls the rules for you.

Each rule should be pure; you should not use side-effects like `subprocess.run()`, `print()`, or the `requests` library. Instead, the Rules API has its own alternatives that are understood by the Pants engine and which work properly with its caching and parallelism.

## The rule graph

All of the registered rules create a rule graph, with each type as a node and the edges being dependencies used to compute those types.

For example, the `list` goal uses this rule definition and results in the below graph:


At the top of the graph will always be the goals that Pants runs, such as `list` and `test`. These goals are the entry-point into the graph. When a user runs `./pants list`, the engine looks for a special type of rule, called a `@goal_rule`, that implements the respective goal. From there, the `@goal_rule` might request certain types like `Console` and `Addresses`, which will cause other helper `@rule`s to be used. To view the graph for a goal, see: [Visualize the rule graph](🔗).

The graph also has several "roots", such as `Console`, `AddressSpecs`, `FilesystemSpecs`, and `OptionsBootstrapper` in this example. Those roots are injected into the graph as the initial input, whereas all other types are derived from those roots.

The engine will find a path through the rules to satisfy the types that you are requesting. In this example, we do not need to explicitly specify `Specs`; we only specify `Addresses` in our rule's parameters, and the engine finds a path from `Specs` to `Addresses` for us. This is similar to [Dependency Injection](🔗), but with a typed and validated graph.

If the engine cannot find a path, or if there is ambiguity due to multiple possible paths, the rule graph will fail to compile. This ensures that the rule graph is always unambiguous.

Rule graph errors can be confusing

We know that rule graph errors can be intimidating and confusing to understand. We are planning to improve them. In the meantime, please do not hesitate to ask for help in the #plugins channel on [Slack](🔗).

Also see [Tips and debugging](🔗) for some tips for how to approach these errors.

## `await Get` - awaiting results in a rule body

In addition to requesting types in your rule's parameters, you can request types in the body of your rule.

Add `await Get(OutputType, InputType, input)`, where the output type is what you are requesting and the input is what you're giving the engine for it to be able to compute the output. For example:

Pants will run your rule like normal Python code until encountering the `await`, which will yield execution to the engine. The engine will look in the pre-compiled rule graph to determine how to go from `Process -> ProcessResult`. Once the engine gives back the resulting `ProcessResult` object, control will be returned back to your Python code.

In this example, we could not have requested the type `ProcessResult` as a parameter to our rule because we needed to dynamically create a `Process` object.

Thanks to `await Get`, we can write a recursive rule to compute a [Fibonacci number](🔗):

Another rule could then "call" our Fibonacci rule by using its own `Get`:

`Get` constructor shorthand

The verbose constructor for a `Get` object takes three parameters: `Get(OutputType, InputType, input)`, where `OutputType` and `InputType` are both types, and `input` is an instance of `InputType`.

Instead, you can use `Get(OutputType, InputType(constructor arguments))`. These two are equivalent:

  • `Get(ProcessResult, Process, Process(["/bin/echo"]))`

  • `Get(ProcessResult, Process(["/bin/echo"]))`

However, the below is invalid because Pants's AST parser will not be able to see what the `InputType` is:

Why only one input?

Currently, you can only give a single input. It is not possible to do something like `Get(OutputType, InputType1(...), InputType2(...))`.

Instead, it's common for rules to create a "Request" data class, such as `PexRequest` or `SourceFilesRequest`. This request centralizes all of the data it needs to operate into one data structure, which allows for call sites to say `await Get(SourceFiles, SourceFilesRequest, my_request)`, for example.

See https://github.com/pantsbuild/pants/issues/7490 for the tracking issue.

### `MultiGet` for concurrency

Every time your rule has the `await` keyword, the engine will pause execution until the result is returned. This means that if you have two `await Get`s, the engine will evaluate them sequentially, rather than concurrently.

You can use `await MultiGet` to instead get multiple results in parallel.

The result of `MultiGet` is a tuple with each individual result, in the same order as the requests.

You should rarely use a `for` loop with `await Get` - use `await MultiGet` instead, as shown above.

`MultiGet` can either take a single iterable of `Get` objects or take multiple individual arguments of `Get` objects. Thanks to this, we can rewrite our Fibonacci rule to parallelize the two recursive calls:

## Valid types

Types used as inputs to `Get`s or `Query`s must be hashable, and therefore should be immutable. Specifically, the type must have implemented `__hash__()` and `__eq__()`. While the engine will not validate that your type is immutable, you should be careful to ensure this so that the cache works properly.

Because you should use immutable types, use these collection types:

  • `tuple` instead of `list`.

  • `pants.util.frozendict.FrozenDict` instead of the built-in `dict`.

  • `pants.util.ordered_set.FrozenOrderedSet` instead of the built-in `set`. This will also preserve the insertion order, which is important for determinism.

Unlike Python in general, the engine uses exact type matches, rather than considering inheritance; even if `Truck` subclasses `Vehicle`, the engine will view these types as completely separate when deciding which rules to use.

You cannot use generic Python type hints in a rule's parameters or in a `Get()`. For example, a rule cannot return `Optional[Foo]`, or take as a parameter `Tuple[Foo, ...]`. To express generic type hints, you should instead create a class that stores that value.

To disambiguate between different uses of the same type, you will usually want to "newtype" the types that you use. Rather than using the builtin `str` or `int`, for example, you should define a new, declarative class like `Name` or `Age`.

### Dataclasses

Python 3's [dataclasses](🔗) work well with the engine because:

  1. If `frozen=True` is set, they are immutable and hashable.

  2. Dataclasses use type hints.

  3. Dataclasses are declarative and ergonomic.

You do not need to use dataclasses. You can use alternatives like `attrs` or normal Python classes. However, dataclasses are a nice default.

You should set `@dataclass(frozen=True)` for Python to autogenerate `__hash__()` and to ensure that the type is immutable.

Don't use `NamedTuple`

`NamedTuple` behaves similarly to dataclasses, but it should not be used because the `__eq__()` implementation uses structural equality, rather than the nominal equality used by the engine.

Custom dataclass `__init__()`

Sometimes, you may want to have a custom `__init__()` constructor. For example, you may want your dataclass to store a `tuple[str, ...]`, but for your constructor to take the more flexible `Iterable[str]` which you then convert to an immutable tuple sequence.

Normally, `@dataclass(frozen=True)` will not allow you to have a custom `__init__()`. But, if you do not set `frozen=True`, then your dataclass would be mutable, which is dangerous with the engine.

Instead, we added a decorator called `@frozen_after_init`, which can be combined with `@dataclass(unsafe_hash=True)`.

### `Collection`: a newtype for `tuple`

If you want a rule to use a homogenous sequence, you can use `pants.engine.collection.Collection` to "newtype" a tuple. This will behave the same as a tuple, but will have a distinct type.

### `DeduplicatedCollection`: a newtype for `FrozenOrderedSet`

If you want a rule to use a homogenous set, you can use `pants.engine.collection.DeduplicatedCollection` to "newtype" a `FrozenOrderedSet`. This will behave the same as a `FrozenOrderedSet`, but will have a distinct type.

You can optionally set the class property `sort_input`, which will often result in more cache hits with the Pantsd daemon.

## Registering rules in `register.py`

To register a new rule, use the `rules()` hook in your [`register.py` file](🔗). This function expects a list of functions annotated with `@rule`.

Conventionally, each file will have a function called `rules()` and then `register.py` will re-export them. This is meant to make imports more organized. Within each file, you can use `collect_rules()` to automatically find the rules in the file.