Skip to main content

Optimizing Python + Docker deploys using Pants

· 8 min read
Joshua Cannon

The Python and Docker logos, with a plus sign between them

Pants can build a PEX file, an executable zip file containing your Python code and all transitive dependencies. Deploying your application is as simple as copying the file. This post elaborates on how to get best performance out of the powerful combination of Pants+PEX+Docker.


The Pantsbuild system ships with support for building an all-in-one distributable Python file called a PEX. A PEX file is an executable zip file containing your Python code and all its transitive dependencies. Deploying your Python application is as simple as copying a PEX file to a system or image with a suitable Python interpreter.

Pants also supports building Docker images and embedding code it packages into those images. With the combination of PEX+Docker, Pants allows you to easily containerize your Python application with minimal boilerplate.

This post builds off of our previous post about Pants + PEX + Docker, and elaborates on how to squeeze the best build-time performance out of this powerful combination.

A very simple example

BUILD
python_sources()

pex_binary(
name = "binary",
entry_point = "app.py",
# Optimal settings for Docker builds
layout = "packed",
execution_mode = "venv",
)

docker_image(
name = "img",
repository = "app",
instructions = [
"FROM python:3.10-slim",
'ENTRYPOINT ["/usr/local/bin/python3.10", "/bin/app"]',
"COPY path.to.here/binary.pex /bin/app",
]
)

The example BUILD file above demonstrates how simple Pants makes building a docker image containing a Python application.

In production environments, however, simplicity at build-time comes with trade-offs. Since a PEX is meant to be an all-in-one distributable, it has a few thorns when used as a container's entrypoint.

  • It must extract itself before it can run, increasing application startup times
  • After extraction, your running container has the PEX and the extracted contents on disk, increasing space required to run
  • Changing a first-party source file requires a full rebuild of the PEX and the container, which doesn't leverage Pants or Docker's caches

Metrics

Using some metrics from a real-world use case (a PEX with 56 third-party requirements, a few large assets, and about a hundred source files) using DOCKER_BUILDKIT set to 1:

PEX: Build time10s
PEX: Size~3.3GB
Docker: Total build time30s
Docker: Context transfer time~3.3GB
Docker: Export time11s
Docker: Startup time18s
Docker: Image size3.63GB

Touching a 1stparty source results in similar metrics. I.e., we get no incrementality.

Simple Multi-Stage Build

We can leverage Docker multi-stage builds and interesting PEX features to solve some of these challenges. Using this recipe in PEX's documentation, we can:

  1. Create a Python virtual environment containing only our third-party dependencies in one stage
  2. Create an identical virtual environment containing only our first-party sources in another stage
  3. COPY them both onto our final "production image" stage

Our BUILD file becomes:

BUILD
python_sources()

pex_binary(
name = "binary",
entry_point = "app.py",
layout = "packed",
execution_mode = "venv",
include_tools = True,
)

docker_image(
name = "img",
repository = "app",
instructions = [
"FROM python:3.10-slim as deps",
"COPY path.to.here/binary.pex /binary.pex",
"RUN PEX_TOOLS=1 /usr/local/bin/python3.10 /binary.pex venv --scope=deps --compile /bin/app",

"FROM python:3.10-slim as srcs",
"COPY path.to.here/binary.pex /binary.pex",
"RUN PEX_TOOLS=1 /usr/local/bin/python3.10 /binary.pex venv --scope=srcs --compile /bin/app",

"FROM python:3.10-slim",
'ENTRYPOINT ["/bin/app/pex"]',
"COPY --from=deps /bin/app /bin/app",
"COPY --from=srcs /bin/app /bin/app",
]
)

This approach has several benefits:

  • It moves the extraction of the PEX to "build time" and also compiles each Python file, so that the application has the lowest-possible startup latency
  • Running the final image doesn't require any additional space
  • The COPY --from=deps instruction in the final image can be cached between runs when only touching first-party code

It also has some drawbacks:

  • Even though the final layer of the deps stage is re-used if deps don't change, the input PEX has changed, and so docker must still re-RUN the extraction
  • Having docker pre-extracting the PEX incurs extra build time

Metrics:

PEX: Build time10s (unchanged)
PEX: Size~3.3GB (unchanged)
Docker: Total build time75.7s (+45.7s)
Docker: Context transfer time13.1s (unchanged)
Docker: Extract deps time35.4s
Docker: Extract srcs time22.5s
Docker: Export time9.3s (unchanged)
Docker: Startup time<1s (-17s)
Docker: Image size5.01GB (+1.38GB)

If we touch a first-party source (leveraging Docker's layer caches) here's what changes:

Docker: Total build time59.7s (-16.9s)
Docker: Context transfer time~0s (-13.1s)

Multi-stage build leveraging 2 PEXs

In order to fully leverage Pants and Docker caches, we can split our all-in-one PEX into two: one for transitive third-party dependencies and one for first-party code.

BUILD
python_sources()

pex_binary(
name="binary-deps",
entry_point="app.py",

layout="packed",
include_sources=False,
include_tools=True,
)

pex_binary(
name="binary-srcs",
entry_point="app.py",

layout="packed",
include_requirements=False,
include_tools=True,
)

docker_image(
name = "img",

repository = "app",
instructions = [
"FROM python:3.10-slim as deps",
"COPY path.to.here/binary-deps.pex /binary-deps.pex",
"RUN PEX_TOOLS=1 /usr/local/bin/python3.10 /binary-deps.pex venv --scope=deps --compile /bin/app",

"FROM python:3.10-slim as srcs",
"COPY path.to.here/binary-srcs.pex /binary-srcs.pex",
"RUN PEX_TOOLS=1 /usr/local/bin/python3.10 /binary-srcs.pex venv --scope=srcs --compile /bin/app",

"FROM python:3.10-slim",
'ENTRYPOINT ["/bin/app/pex"]',
"COPY --from=deps /bin/app /bin/app",
"COPY --from=srcs /bin/app /bin/app",
]
)

Metrics

Deps PEX: Build time5s
Deps PEX: Size936MB
Srcs PEX: Build time5s
Srcs PEX: Size2.4GB
Docker: Total build time68.3s (roughly unchanged)
Docker: Context transfer time13s (unchanged)
Docker: Extract deps time30.2s (-5.2s)
Docker: Extract srcs time7.9s (-14.s)
Docker: Export time9.3s (unchanged)
Docker: Startup time<1s
Docker: Image size5.01GB (unchanged)

If we touch a first-party source (leveraging Pants and Docker's layer caches) here's what changes:

Deps PEX: Build time0s (-5s)
Docker: Total build time23s (-45.3s)
Docker: Context transfer time~0s (-13s)
Docker: Extract deps time0s (-30.2s)
Docker: Extract srcs time4.7s (-3.9.s)

Multiple Images and tagging

This approach leads to significant speedup in both the cold and warm build times, but it can be improved even further:

  • The important layers that allow for a faster incremental build aren't in an image, so they will get cleaned when docker is pruned
  • If we tell Pants to tag our image, docker will tag the intermediate images before tagging the final one, leading to two <none>-tagged images in addition to the final, tagged, image

We can fix this by using several docker_image targets in tandem:

BUILD
...

docker_image(
name = "img-deps",
repository="app",
registry=["companyname"],
image_tags=["deps"],
skip_push=True,
instructions = [
"FROM python:3.10-slim",
"COPY path.to.here/binary-deps.pex /",
"RUN PEX_TOOLS=1 /usr/local/bin/python3.10 /binary-deps.pex venv --scope=deps --compile /bin/app",
]
)

docker_image(
name = "img-srcs",
repository="app",
registry=["companyname"],
image_tags=["srcs"],
skip_push=True,
instructions = [
"FROM python:3.10-slim",
"COPY path.to.here/binary-srcs.pex /",
"RUN PEX_TOOLS=1 /usr/local/bin/python3.10 /binary-srcs.pex venv --scope=srcs --compile /bin/app",
]
)

docker_image(
name = "img",
dependencies=[":img-srcs", ":img-deps"],
repository="app",
instructions = [
"FROM python:3.10-slim",
'ENTRYPOINT ["/bin/app/pex"]',
"COPY --from=companyname/app:deps /bin/app /bin/app",
"COPY --from=companyname/app:srcs /bin/app /bin/app",
]
)

Pants will build the dependent images before building the final one. (Note the registry value "companyname" can be any string (or set the registry to an empty list), we just need to hardcode something that we can reference in the final image's COPY instructions).

Now after building the image, we are free to prune untagged images and layers.

Further Optimizations

There are avenues to squeezing even more performance out of this approach, such as declaring our large assets as files, and using a dedicated stage to COPY them into the virtual environment.

Additionally, the above targets can all be wrapped into a handy Pants macro for simplicity and re-usability.

In Conclusion

Pants' support for Python and Docker is well-equipped to easily cater to varying business needs and interesting use-cases. And although support for everything listed here is primed and ready as of the upcoming Pants 2.13 release, the Pants community hopes to make even the most complex use-case (such as the final example) as simple to declare as the straightforward use-case (such as the first example) in future versions.

If you want to learn more about Pants, PEX and how they can help you deploy Python applications efficiently, come and say hi on Slack!