JVM 3rdparty Pattern
In general, we recommend the 3rdparty idiom for organizing dependencies on code from outside the source tree. This document describes how to make this work for JVM (Java or Scala) code.
If you have a small to medium number of third-party dependencies, you can define
them all in a single
3rdparty/jvm/BUILD file. If you have a large number, it
may make sense to organize them in multiple subdirectories, say by category or by publisher.
jar_library(name='junit', jars=[ jar('junit', 'junit', '4.12') ], # junit is frequently used only for its annotations. scope='forced',)
Your Code's BUILD File
To set up your code to import the external jar, you add a dependency to
the appropriate Java target[s] in your
BUILD file and add
statements in your Java code.
For example, your
BUILD file might have
And your Java code might have:
"Round Trip" Dependencies
It is possible for your code to exist as source in the repo but also be published as a binary to an external repository. If you happen to pull in any third party artifacts, they may express a dependency on the published version of the artifact. This means that the classpath will contain both the version in the repo compiled from source and an older version that was previously published. In this case, you want to be sure that when pants always prefers the version built from source.
Fortunately, the remedy for this is simple. If you add a
that matches the one used to publish the artifact, pants will always prefer the
local target definition to the published jar if it is in the context:
java_library(name='api', sources = globs('*.java'), provides = artifact(org='org.archie', name='api', repo=myrepo), ) jar_library(name='bin-dep', jars=[ jar(org='org.archie', name='consumer', rev='1.2.3'), ], dependencies=[ # Include the local, source copy of the API to cause it to be used rather than # any versioned binary copy that the `consumer` lib might depend on transitively. ':api', ])
Controlling JAR Dependency Versions
If you notice that one "foreign" dependency pulls in mostly wrong
things, tell Pants not to pull in its dependencies. In your
3rdparty/.../BUILD file, use
intransitive argument; then
carefully add hand-picked versions:
jar_library(name="retro-naming-factory", jars=[ jar(org='retro', name='retro-factory', rev='5.0.18', intransitive=True), ], dependencies=[ # Don't use retro's expected (old, incompatible) common-logging # version, yipe; use the same version we use everywhere else: '3rdparty/jvm/common-logging', ] )
If you notice a small number of transitive dependencies to exclude
Rather than mark the
jar intransitive, you can
transitive dependencies from JVM targets:
java_library(name = 'loadtest', dependencies = [ '3rdparty/storm:storm', ], sources = globs('*.java'), excludes = [ exclude('org.sonatype.sisu.inject', 'cglib') ] )
Managing Transitive Dependencies
If you have jars that pull in many transitive dependencies, you probably want to constrain which versions of those transitive dependencies you pull in. This is valuable for:
- Security concerns (you may want to avoid artifacts with known vulnerabilities, or you may only want to use particular jars which you trust).
- Predictable and consistent behavior across all projects in your repository (described below).
- Caching concerns/build times (described below).
Otherwise, you may have some targets that end up being built with
1.2.3 of a transitive dependency, and others that get built with
4.5.6 of that dependency. Worse, the same target might be built with
different versions of a transitive dependency depending on what other
targets happen to be part of the same pants invocation. To illustrate
this, consider the diagram below:
bar are binary targets. If you build a binary of
./pants binary foo,
foo will be packaged with the
in addition to its transitive dependencies, which will be resolved as the
common jar, version
Likewise, if you run
./pants binary bar, it will be packaged with
and the transitive dependencies of
demo, which here is simply the
However, if you run
./pants binary foo bar, ivy will only resolve one
common-1.2.3, which most likely means that both
4.5.6 (because it is the more recent version).
This is a problem, because it may be that
common-4.5.6 is not compatible
3rdparty:example, which will break the
foo binary at runtime.
More subtly, if you have many intermediate
java_library targets between
jvm_binaries and your
jar_library targets (which is normaly the
case), simply changing which combination of
java_library targets are in
./pants invocation may invalidate the cache and force Pants to
recompile them, even if their sources are unchanged. This is because they
may resolve different versions of their transitive jar dependencies than
the previous time they were compiled, which means their classpaths will be
different. Getting a different classpath causes a cache-miss, forcing a
recompile. In general recompiling when the classpath changes is the
correct thing to do, however this means that unstable transitive
dependencies will cause a lot of cache-thrashing. If you have a large
repository with a large amount of code, recompiles get expensive.
There are a few ways to avoid or work around these problems. A simple method is to use the strict ivy conflict manager, which will cause the jar resolution to fail with an error if it detects two artifacts with conflicting versions. This has the advantage of forcing a dev to be aware of (and make a decision about) confliction versions.
You could also disable transitive jar resolution altogether, and explicitly declare every dependency you need. This ensures that you have total control over your external dependencies, but can be difficult to maintain.
The third option is using
managed_jar_dependencies, to pin the versions of
the subset of your transitive dependencies that you care about.
Managed Jar Dependencies
Maven handles this problem with the
and Pants has similar functionality via the
You can set up your
3rdparty/BUILD file like so:
managed_jar_dependencies(name='management', artifacts=[ ':commons-io', ':jersey-core', ], ) jar_library(name='commons-io', jars=[ jar('commons-io', 'commons-io', '2.5'), ], ) jar_library(name='jersey-core', jars=[ jar('com.sun.jersey', 'jersey-core', '1.19.1'), ], )
[jar-dependency-management] default_target: 3rdparty:management
This will force all
jar_library targets in your repository to use the
jersey-core referenced by the
target. When resolving transitive dependencies, Pants will always choose
the versions "pinned" by the managed dependencies target.
jar_library omits the version for one of its
jar()s, it will use
the version defined in
managed_jar_dependencies. If a
jar() defines a
version that conflicts with the version set in
an error will be raised and the build will fail (though this behavior can
be modified via the
This is a bit verbose, and entails a bit of duplicate code (you have to
jersey-core 3 times in the above example). You can use the
managed_jar_libraries target factory instead to make your
definitions more concise.
This example is equivalent to the one earlier, but using
managed_jar_libraries(name='management', artifacts=[ jar('commons-io', 'commons-io', '2.5'), jar('com.sun.jersey', 'jersey-core', '1.19.1'), ], )
This automatically generates
jar_library targets for you, and makes a
managed_jar_dependencies target to reference them. (Note that you still
need to make the same change to pants.ini).
The generated library targets follow the naming convention
ext are omitted if they
are the default values.
So in the above example will generate two
jar_library targets, called
The artifacts list of
managed_jar_libraries can also accept target
addresses to already-existing
jar_libraries, just like a normal
managed_jar_dependences target. In this case,
will just use the referenced target, rather than generating a new one.
With Pants you can create multiple
If you have more than one, for any particular
jar_library, you can define
managed_jar_dependencies it uses explicitly (rather than using the
default defined in
jar_library(name='org.apache.hadoop-alternate', jars=[ jar('org.apache.hadoop', 'hadoop-common', '2.7.0'), ], managed_dependencies='3rdparty:management-alternate', )
Using a SNAPSHOT JVM Dependency
Sometimes your code depends on a buggy external JVM dependency. You
think you've fixed the external code, but want to test locally before
uploading it to make sure. To do this, in the
jar dependency for the
artifact, specify the
url attribute to point to the local file and
rev. If you are actively making changes to the dependency,
you can also use the
mutable jar attribute to re-import the file each
time pants is run (otherwise, pants will cache it):
jar_library(name='checkstyle', jars = [ jar(org='com.puppycrawl.tools', name='checkstyle', rev='5.5-SNAPSHOT', url='file:///Users/pantsdev/Src/checkstyle/checkstyle.jar', mutable=True), ], )