It is not safe to use
pathlib.Path like you normally would because this can break caching. Instead, Pants has several mechanisms to work with the file system in a safe and concurrent way.
Missing certain file operations?
The core building block is a
Digest, which is a lightweight reference to a set of files known about by the engine.
Digestis only a reference; the files are stored in the engine's persistent content-addressable storage (CAS).
- The files do not need to actually exist on disk.
- Every file uses a relative path. This allows the
Digestto be passed around in different environments safely, such as running in a temporary directory locally or running through remote execution.
- The files may be binary files and/or text files.
Digestmay refer to 0 - n files. If it's empty, the digest will be equal to
- You will rarely create a
Digestusing its constructor. Almost always, you will get a
PathGlobs, or using the
Processthat you've run.
Most of Pants's operations with the file system either accept a
Digest as input or return a
Digest. For example, when running a
Process, you may provide a
Digest as input.
Snapshot composes a
Digest and adds the useful properties
.dirs, which store the sorted file names and directory names, respectively. For example:
Snapshot( digest=Digest( fingerprint="21bcd9fcf01cc67e9547b7d931050c1c44d668e7c0eda3b5856aa74ad640098b", serialized_bytes_length=162, ), files=("f.txt", "grandparent/parent/c.txt"), dirs=("grandparent", "grandparent/parent"), )
Snapshot is useful when you want to know which files a
Digest refers to. For example, when running a tool, you might set
argv=snapshot.files, and then pass
snapshot.digest to the
Process so that it has access to those files.
Digest, you may use the engine to enrich it into a
from pants.engine.fs import Digest, Snapshot from pants.engine.rules import Get, rule @rule async def demo(...) -> Foo: ... snapshot = await Get(Snapshot, Digest, my_digest)
CreateDigest allows you to create a new digest with whichever files you would like, even if they do not exist on disk.
from pants.engine.fs import CreateDigest, Digest, FileContent from pants.engine.rules import Get, rule @rule async def demo(...) -> Foo: ... digest = await Get(Digest, CreateDigest([FileContent("f1.txt", b"hello world")])
CreateDigest constructor expects an iterable of
FileContent objects, which take a
path: str parameter,
contents: bytes parameter, and optional
is_executable: bool parameter with a default of
This does not write the
Digest to the build root. Use
Workspace.write_digest() for that.
PathGlobs allows you to read from the local file system.
from pants.engine.fs import Digest, PathGlobs from pants.engine.rules import Get, rule @rule async def demo(...) -> Foo: ... digest = await Get(Digest, PathGlobs(["**/*.txt", "!ignore_me.txt"])
- All globs must be relative paths, relative to the build root.
PathGlobsuses the same syntax as the
sourcesfield, which is roughly Git's syntax. Use
**for recursive globs, and prefix with
PathGlobswill ignore all values from the global option
By default, the engine will no-op for any globs that are unmatched. If you want to instead warn or error, set
GlobMatchErrorBehavior.error. This will require that you also set
description_of_origin, which is a human-friendly description of where the
PathGlobs is coming from so that the error message is helpful. For example:
from pants.engine.fs import GlobMatchErrorBehavior, PathGlobs PathGlobs( globs=[shellcheck.options.config], glob_match_error_behavior=GlobMatchErrorBehavior.error, description_of_origin="the option `--shellcheck-config`", )
If you set
glob_match_error_behavior, you may also want to set
conjunction. By default, only one glob must match. If you set
conjunction=GlobExpansionConjunction.all_match, then all globs must match or the engine will warn or error. For example, this would fail, even if the config file existed:
from pants.engine.fs import GlobExpansionConjunction, GlobMatchErrorBehavior, PathGlobs PathGlobs( globs=[shellcheck.options.config, "does_not_exist.txt"], glob_match_error_behavior=GlobMatchErrorBehavior.error, conjunction=GlobExpansionConjunction.all_match, description_of_origin="the option `--shellcheck-config`", )
If you only need to resolve the file names—and don't actually need to use the file content—you can use
await Get(Paths, PathGlobs) instead of
await Get(Digest, PathGlobs) or
await Get(Snapshot, PathGlobs). This will avoid "digesting" the files to the LMDB Store cache as a performance optimization.
Paths has two properties:
files: Tuple[str, ...] and
dirs: Tuple[str, ...].
from pants.engine.fs import Paths, PathGlobs from pants.engine.rules import Get, rule @rule async def demo(...) -> Foo: ... paths = await Get(Paths, PathGlobs(["**/*.txt", "!ignore_me.txt"]) logger.info(paths.files)
DigestContents allows you to get the file contents from a
from pants.engine.fs import Digest, DigestContents from pants.engine.rules import Get, rule @rule async def demo(...) -> Foo: ... digest_contents = await Get(DigestContents, Digest, my_digest) for file_content in digest_contents: loggger.info(file_content.path) logger.info(file_content.content) # This will be `bytes`.
The result will be a sequence of
FileContent objects, which each have a property
path: str and a property
content: bytes. You may want to call
content.decode() to convert to
You may not need
DigestContentsif you need to read and operate on the content of files directly in your rule.
If you are running a
Process, you only need to pass the
Digestas input and that process will be able to read all the files in its environment. If you only need a list of files included in the digest, use
Often, you will need to provide a single
Digest somewhere in your plugin—such as the
input_digest for a
Process—but you may have multiple
Digests that you want to use. Use
MergeDigests to combine them all into a single
from pants.engine.fs import Digest, MergeDigests from pants.engine.rules import Get, rule @rule async def demo(...) -> Foo: ... digest = await Get( Digest, MergeDigests([downloaded_tool_digest, config_file_digest, source_files_snapshot.digest], )
- It is okay if multiple digests include the same file, so long as they have identical content.
- If any digests have different content for the same file, the engine will error. Unlike Git, the engine does not attempt to resolve merge conflicts.
- It is okay if some of the digests are empty, i.e.
To get certain files out of a
from pants.engine.fs import Digest, DigestSubset, PathGlobs from pants.engine.rules import Get, rule @rule async def demo(...) -> Foo: ... new_digest = await Get( Digest, DigestSubset(original_digest, PathGlobs(["file1.txt"]) )
See the section
PathGlobs for more details on how the type works.
RemovePrefix to change the paths of every file in the digest, while keeping the file contents the same.
from pants.engine.fs import AddPrefix, Digest, RemovePrefix from pants.engine.rules import Get, rule @rule async def demo(...) -> Foo: ... added_prefix = await Get(Digest, AddPrefix(original_digest, "new_prefix/subdir")) removed_prefix = await Get(Digest, RemovePrefix(original_digest, "new_prefix/subdir")) assert removed_prefix == original_digest
RemovePrefix will error if it encounters any files that do not have the request prefix.
To write a digest to disk in the build root, request the type
Workspace, then use it's method
from pants.engine.fs import Workspace from pants.engine.rules import goal_rule @goal_rule async def run_my_goal(..., workspace: Workspace) -> MyGoal: ... # Note that this is a normal method; we do not use `await Get`. workspace.write_digest(digest)
- The digest will always be written to the build root; you cannot write to arbitrary locations on your machine.
- You may set the optional parameter
path_prefix: strwith a relative path.
Workspace is a special type that can only be requested in
@goal_rules because it is only safe to write to disk in a
@goal_rule. So, a common pattern is for "downstream" rules to return a
Digest with the contents they want to write to disk, and then the
@goal_rule aggregating all the results and writing them to disk. For example, for the
fmt goal, each
FmtResult includes a
For better performance, avoid calling
workspace.write_digest multiple times, such as in a
for loop. Instead, first, merge all the digests, then write them in a single call.
for digest in all_digests: workspace.write_digest(digest)
merged_digest = await Get(Digest, MergeDigests(all_digests)) workspace.write_digest(merged_digest)
DownloadFile allows you to download an asset using a
from pants.engine.fs import DownloadFile, FileDigest from pants.engine.rules import Get, rule @rule async def demo(...) -> Foo: ... url = "https://github.com/pantsbuild/pex/releases/download/v2.1.14/pex" file_digest = FileDigest( "12937da9ad5ad2c60564aa35cb4b3992ba3cc5ef7efedd44159332873da6fe46", 2637138 ) downloaded = await Get(Digest, DownloadFile(url, file_digest) assert downloaded == expected_digest
DownloadFile expects a
url: str parameter pointing to a stable URL for the asset, along with an
expected_digest: FileDigest parameter. A
FileDigest is like a normal
Digest, but represents a single file, rather than a set of files/directories. To determine the
expected_digest, manually download the file, then run
shasum -a 256 to compute the fingerprint and
wc -c to compute the expected length of the downloaded file in bytes.
Often, you will want to download a pre-compiled binary for a tool. When doing this, use
ExternalTool instead for help with extracting the binary from the download. See Installing tools.
Want support for HTTP requests? Let us know.
It is not safe to use
DownloadFilefor most HTTP requests, as it will never ping the server for updates once it is cached. It is also not safe to use the
requestslibrary or similar because it will not be cached safely.
In the meantime, you can use a
Processwith uniquely identifying information in its arguments to run
If you would like proper support for HTTP requests to use in your plugin, please let us know by commenting on the tracking GitHub issue so that we know to prioritize this: https://github.com/pantsbuild/pants/issues/8347.
Updated 3 months ago