Using Pants in CI
Suggestions for how to use Pants to speed up your CI (continuous integration).
See the example-python repository for an example GitHub Actions worfklow.
Directories to cache
init-pants
GitHub ActionIf you're using GitHub Actions to run your CI workflows, then you can use our standard action to set up and cache the Pants bootstrap state. Otherwise, read on to learn how to configure this manually.
In your CI's config file, we recommend caching these directories:
$HOME/.cache/nce
(Linux) or$HOME/Library/Caches/nce
(macOS)
This is the cache directory used by the Pants launcher binary to cache the assets, interpreters and venvs required to run Pants itself. Cache this against the Pants version, as specified inpants.toml
. See the pantsbuild/example-python repo for an example of how to generate an effective cache key for this directory in GitHub Actions.$HOME/.cache/pants/named_caches
Caches used by some underlying tools. Cache this against the inputs to those tools. For thepants.backend.python
backend, named caches are used by PEX, and therefore its inputs are your lockfiles. Again, see pantsbuild/example-python for an example.
If you're not using a fine-grained remote caching service, then you may also want to preserve the local Pants cache at $HOME/.cache/pants/lmdb_store
. This has to be invalidated on any file that can affect any process, e.g., hashFiles('**/*')
on GitHub Actions.
Computing such a coarse hash, and saving and restoring large directories, can be unwieldy. So this may be impractical and slow on medium and large repos.
A remote cache service integrates with Pants's fine-grained invalidation and avoids these problems, and is recommended for the best CI performance.
See Troubleshooting for how to change these cache locations.
In CI, the cache must be uploaded and downloaded every run. This takes time, so there is a tradeoff where too large a cache will slow down your CI.
You can use this script to nuke the cache when it gets too big:
function nuke_if_too_big() {
path=$1
limit_mb=$2
size_mb=$(du -m -d0 "${path}" | cut -f 1)
if (( size_mb > limit_mb )); then
echo "${path} is too large (${size_mb}mb), nuking it."
rm -rf "${path}"
fi
}
nuke_if_too_big ~/.cache/nce 512
nuke_if_too_big ~/.cache/pants/named_caches 1024
[stats].log
Set the option [stats].log = true
in pants.ci.toml
for Pants to print metrics of your cache's performance at the end of the run, including the number of cache hits and the total time saved thanks to caching, e.g.:
local_cache_requests: 204
local_cache_requests_cached: 182
local_cache_requests_uncached: 22
local_cache_total_time_saved_ms: 307200
You can also add plugins = ["hdrhistogram"]
to the [GLOBAL]
section of pants.ci.toml
for Pants to print histograms of cache performance, e.g. the size of blobs cached.
Rather than storing your cache with your CI provider, remote caching stores the cache in the cloud, using gRPC and the open-source Remote Execution API for low-latency and fine-grained caching.
This brings several benefits over local caching:
- All machines and CI jobs share the same cache.
- Remote caching downloads precisely what is needed by your run—when it's needed—rather than pessimistically downloading the entire cache at the start of the run.
- No download and upload stage for your cache.
- No need to "nuke" your cache when it gets too big.
See Remote Caching and Execution for more information.
Recommended commands
With both approaches, you may want to shard the input targets into multiple CI jobs, for increased parallelism. See Advanced Target Selection. (This is typically less necessary when using remote caching.)
Approach #1: only run over changed files
Because Pants understands the dependencies of your code, you can use Pants to speed up your CI by only running tests and linters over files that actually made changes.
We recommend running these commands in CI: