Hey! These docs are for version 2.10, which is no longer officially supported. Click here for the latest version, 2.18!

It is not safe to use functions like `open` or the non-pure operations of `pathlib.Path` like you normally might: this will break caching because they do not hook up to Pants's file watcher.

Instead, Pants has several mechanisms to work with the file system in a safe and concurrent way.

Missing certain file operations?

If it would help you to have a certain file operation, please let us know by either opening a new [GitHub issue](🔗) or by messaging us on [Slack](🔗) in the #plugins room.

## Core abstractions: `Digest` and `Snapshot`

The core building block is a `Digest`, which is a lightweight reference to a set of files known about by the engine.

  • The `Digest` is only a reference; the files are stored in the engine's persistent [content-addressable storage (CAS)](🔗).

  • The files do not need to actually exist on disk.

  • Every file uses a relative path. This allows the `Digest` to be passed around in different environments safely, such as running in a temporary directory locally or running through remote execution.

  • The files may be binary files and/or text files.

  • The `Digest` may refer to 0 - n files. If it's empty, the digest will be equal to `pants.engine.fs.EMPTY_DIGEST`.

  • You will never create a `Digest` rules, only in tests. Instead, you get a `Digest` by using `CreateDigest` or `PathGlobs`, or using the `output_digest` from a `Process` that you've run.

Most of Pants's operations with the file system either accept a `Digest` as input or return a `Digest`. For example, when running a `Process`, you may provide a `Digest` as input.

A `Snapshot` composes a `Digest` and adds the useful properties `files: tuple[str, ...]` and `dirs: tuple[str, ...]`, which store the sorted file names and directory names, respectively. For example:

A `Snapshot` is useful when you want to know which files a `Digest` refers to. For example, when running a tool, you might set `argv=snapshot.files`, and then pass `snapshot.digest` to the `Process` so that it has access to those files.

Given a `Digest`, you may use the engine to enrich it into a `Snapshot`:

## `CreateDigest`

`CreateDigest` allows you to create a new digest with whichever files you would like, even if they do not exist on disk.

The `CreateDigest` constructor expects an iterable of `FileContent` objects, which take a `path: str` parameter, `contents: bytes` parameter, and optional `is_executable: bool` parameter with a default of `False`.

This does _not_ write the `Digest` to the build root. Use `Workspace.write_digest()` for that.

## `PathGlobs`

`PathGlobs` allows you to read from the local file system using globbing. That is, sets of filenames with wildcard characters.

  • All globs must be relative paths, relative to the build root.

  • `PathGlobs` uses the same syntax as the `sources` field, which is roughly Git's syntax. Use `*` for globs over just the current working directory, `**` for recursive globs over everything below (at any level the current working directory, and prefix with `!` for ignores.

  • `PathGlobs` will ignore all values from the global option `pants_ignore`.

By default, the engine will no-op for any globs that are unmatched. If you want to instead warn or error, set `glob_match_error_behavior=GlobMatchErrorBehavior.warn` or `GlobMatchErrorBehavior.error`. This will require that you also set `description_of_origin`, which is a human-friendly description of where the `PathGlobs` is coming from so that the error message is helpful. For example:

If you set `glob_match_error_behavior`, you may also want to set `conjunction`. By default, only one glob must match. If you set `conjunction=GlobExpansionConjunction.all_match`, then all globs must match or the engine will warn or error. For example, this would fail, even if the config file existed:

If you only need to resolve the file names—and don't actually need to use the file content—you can use `await Get(Paths, PathGlobs)` instead of `await Get(Digest, PathGlobs)` or `await Get(Snapshot, PathGlobs)`. This will avoid "digesting" the files to the LMDB Store cache as a performance optimization. `Paths` has two properties: `files: tuple[str, ...]` and `dirs: tuple[str, ...]`.

## `DigestContents`

`DigestContents` allows you to get the file contents from a `Digest`.

The result will be a sequence of `FileContent` objects, which each have a property `path: str` and a property `content: bytes`. You may want to call `content.decode()` to convert to `str`.

You may not need `DigestContents`

Only use `DigestContents` if you need to read and operate on the content of files directly in your rule.

  • If you are running a `Process`, you only need to pass the `Digest` as input and that process will be able to read all the files in its environment. If you only need a list of files included in the digest, use `Get(Snapshot, Digest)`.

  • If you just need to manipulate the directory structure of a `Digest`, such as renaming files, use `DigestEntries` with `CreateDigest` or use `AddPrefix`, and `RemovePrefix`. These avoid reading the file content into memory.

Does not handle empty directories in a `Digest`

`DigestContents` does not have a way to represent empty directories in a `Digest` since it is only a sequence of `FileContent` objects. That is, passing the `FileContent` objects to `CreateDigest` will not result in the original `Digest` if there were empty directories in that original `Digest`. Use `DigestEntries` instead if your rule needs to handle empty directories in a `Digest`.

## `DigestEntries`

`DigestEntries` allows a rule to obtain the filenames (with content digests) and empty directories from a `Digest`. The value of a `DigestEntries` is a sequence of `FileEntry` and `Directory` objects representing files and empty directories in the `Digest`, respectively. That sequence can be passed to `CreateDigest` to recreate the original `Digest`.

This is useful if you need to manipulate the directory structure of a `Digest` without actually needing to bring the file contents into memory (which is what occurs if you were to use `DigestContents`).

## `MergeDigests`

Often, you will need to provide a single `Digest` somewhere in your plugin—such as the `input_digest` for a `Process`—but you may have multiple `Digest`s that you want to use. Use `MergeDigests` to combine them all into a single `Digest`.

  • It is okay if multiple digests include the same file, so long as they have identical content.

  • If any digests have different content for the same file, the engine will error. Unlike Git, the engine does not attempt to resolve merge conflicts.

  • It is okay if some of the digests are empty, i.e. `EMPTY_DIGEST`.

## `DigestSubset`

To get certain files out of a `Digest`, use `DigestSubset`.

See the section `PathGlobs` for more details on how the type works.

## `AddPrefix` and `RemovePrefix`

Use `AddPrefix` and `RemovePrefix` to change the paths of every file in the digest, while keeping the file contents the same.

`RemovePrefix` will error if it encounters any files that do not have the requested prefix.

## `Workspace.write_digest()`

To write a digest to disk in the build root, request the type `Workspace`, then use its method `.write_digest()`.

  • The digest will always be written to the build root; you cannot write to arbitrary locations on your machine.

  • You may set the optional parameter `path_prefix: str` with a relative path.

`Workspace` is a special type that can only be requested in `@goal_rule`s because it is only safe to write to disk in a `@goal_rule`. So, a common pattern is for "downstream" rules to return a `Digest` with the contents they want to write to disk, and then the `@goal_rule` aggregating all the results and writing them to disk. For example, for the `fmt` goal, each `FmtResult` includes a `digest` field.

For better performance, avoid calling `workspace.write_digest` multiple times, such as in a `for` loop. Instead, first, merge all the digests, then write them in a single call.



## `DownloadFile`

`DownloadFile` allows you to download an asset using a `GET` request.

`DownloadFile` expects a `url: str` parameter pointing to a stable URL for the asset, along with an `expected_digest: FileDigest` parameter. A `FileDigest` is like a normal `Digest`, but represents a single file, rather than a set of files/directories. To determine the `expected_digest`, manually download the file, then run `shasum -a 256` to compute the fingerprint and `wc -c` to compute the expected length of the downloaded file in bytes.

Often, you will want to download a pre-compiled binary for a tool. When doing this, use `ExternalTool` instead for help with extracting the binary from the download. See [Installing tools](🔗).

HTTP requests without digests are unsafe

It is not safe to use `DownloadFile` for mutable HTTP requests, as it will never ping the server for updates once it is cached. It is also not safe to use the `requests` library or similar because it will not be cached safely.

You can use a `Process` with uniquely identifying information in its arguments to run `/usr/bin/curl`.