Task Developer's Guide

In Pants, code that does "real build work" (e.g., downloads prebuilt artifacts, compiles Java code, runs tests) lives in Tasks. To add a feature to Pants so that it can, e.g., compile a new language, you want to write a new Task.

This page documents how to develop a Pants Task, enabling you to teach pants how to do things it does not already know how to do today. To see the Tasks that are built into Pants, look over src/python/pants/backend/*/tasks/*.py. The code makes more sense if you know the concepts from internals. The rest of this page introduces some concepts especially useful when defining a Task.

Hello Task

To implement a Task, you define a subclass of pants.task.task.Task and define an execute method for that class. The execute method does the work.

The Task can see (and affect) the state of the build via its .context member, a pants.goal.context.Context.

Which targets to act on? A typical Task wants to act on all "in play" targets that match some predicate. Here, "'in play' targets" means those targets the user specified on the command line, the targets needed to build those targets, the targets needed to build those targets, etc. Call self.context.targets() to get these. This method takes an optional parameter, a predicate function; this is useful for filtering just those targets that match some criteria.

Task Installation: Associate Task with Goal[s]

Defining a Task is nice, but doesn't hook it up so users can get to it. Install a task to make it available to users. To do this, you register it with Pants, associating it with a goal. A plugin's register.py registers goals in its register_goals function. Here's an excerpt from Pants' own Python backend:

def register_goals():
  task(name='interpreter', action=SelectInterpreter).install('pyprep')
  task(name='requirements', action=ResolveRequirements).install('pyprep')
  task(name='sources', action=GatherSources).install('pyprep')
  task(name='py', action=PythonRun).install('run')
  task(name='pytest-prep', action=PytestPrep).install('test')
  task(name='pytest', action=PytestRun).install('test')
  task(name='py', action=PythonRepl).install('repl')
  task(name='setup-py', action=SetupPy).install()
  task(name='py', action=PythonBinaryCreate).install('binary')
  task(name='isort', action=IsortPythonTask).install('fmt')

That task(...) is a name for pants.goal.task_registrar.TaskRegistrar. Calling its install method installs the task in a goal with the same name. To install a task in goal foo, use Goal.by_name('foo').install. You can install more than one task in a goal; e.g., there are separate tasks to run Java tests and Python tests; but both are in the test goal.

Generally you'll be installing your task into an existing goal like test, fmt or compile. You can find most of these goals and their purpose by running the ./pants goals command; however, some goals of a general nature are installed by pants without tasks and are thus hidden from ./pants goals output. The buildgen goal is an example of this, reserving a slot for tasks that can auto-generate BUILD files for various languages; none of which are installed by default. You can hunt for these by searching for Goal.register calls in src/python/pants/core_tasks/register.py.

Products: How one Task consumes the output of another

One task might need to consume the "products" (outputs) of another. E.g., the Java test runner task uses Java .class files that the Java compile task produces. Pants tasks keep track of this in the pants.goal.products.ProductMapping that is provided in the task's context at self.context.products.

The ProductMapping is basically a dict. Calling self.context.products.get('jar_dependencies') looks up jar_dependencies in that dict. Tasks can set/change the value stored at that key; later tasks can read (and perhaps further change) that value. That value might be, say, a dictionary that maps target specs to file paths.

product_types and require_data: Why "test" comes after "compile"

It might only make sense to run your Task after some other Task has finished. E.g., Pants has separate tasks to compile Java code and run Java tests; it only makes sense to run those tests after compiling. To tell Pants about these inter-task dependencies...

The "early" task class defines a product_types class method that returns a list of strings:

  def product_types(cls):
    return ['runtime_classpath']

The "late" task defines a prepare method that calls round_manager.require_data to "require" one of those same strings:

  def prepare(cls, options, round_manager):
    super(DuplicateDetector, cls).prepare(options, round_manager)
    round_manager.require_data('runtime_classpath')

Pants uses this information to determine which tasks must run first to prepare data required by other tasks. (If one task requires data that no task provides, Pants errors out.)

A task can have more than one product type. You might want to know which type[s] were required by other tasks. If one product is especially "expensive" to make, perhaps your task should only do so if another task will use it. Use self.context.products.isrequired to find out if a task required a product type:

    catalog = self.context.products.isrequired(self.jvmdoc().product_type)
    if catalog and self.combined:
      raise TaskError(
          'Cannot provide {} target mappings for combined output'.format(self.jvmdoc().product_type))

Task Implementation Versioning

Tasks may optionally specify an implementation version. This is useful to be sure that cached objects from previous runs of pants using an older version are not used. If you change a task class in a way that will impact its outputs you should update the version. Implementation versions are set with the class method implementation_version.

Class FooTask(Task):
  @classmethod
  def implementation_version(cls):
    return super(FooTask, cls).implementation_version() + [('FooTask', 1)]

We store both a version number and the name of the class in order to disambiguate changes in different classes that have the same implementation version set.

Task Configuration

Tasks may be configured via the options system.

To define an option, implement your Task's register_options class method and call the passed-in register function:

  def register_options(cls, register):
    super(Checkstyle, cls).register_options(register)
    register('--skip', type=bool, fingerprint=True,
             help='Skip checkstyle.')
    register('--configuration', advanced=True, type=file_option, fingerprint=True,
             help='Path to the checkstyle configuration file.')
    register('--properties', advanced=True, type=dict_with_files_option, default={},
             fingerprint=True,
             help='Dictionary of property mappings to use for checkstyle.properties.')

Option values are available via self.get_options():

# Did user pass in the --my-option CLI flag (or set it in pants.ini)?
if self.get_options().my_option:

Scopes

Every task has an options scope: If the task is registered as my-task in goal my-goal, then its scope is my-goal.my-task, unless goal and task are the same string, in which case the scope is simply that string. For example, the ZincCompile task has scope compile.zinc, and the filemap task has the scope filemap.

The scope is used to set options values. E.g., the value of self.get_options().my_option for a task with scope scope is set by, in this order: - The value of the cmd-line flag --scope-my-option. - The value of the environment variable PANTS_SCOPE_MY_OPTION. - The value of the pants.ini var my_option in section scope.

Note that if the task being run is specified explicitly on the command line, you can omit the scope from the cmd-line flag name. For example, instead of ./pants compile --compile-java-foo-bar you can do ./pants compile.java --foo-bar. See Invoking Pants for more information.

Fine-tuning Options

When calling the register function, passing a few additional arguments will affect the behaviour of the registered option. The most common parameters are:

  • type: Constrains the type of the option. Takes a python type constructor (one of bool, str, int, float, list, dict), or a special option type constructor like target_option from pants.option.custom_types. If not specified, the option will be a string.
  • default: Sets a default value that will be used if the option is not specified by the user.
  • advanced: Indicates that an option is intended either for use by power users, or for use in only in pants.ini, rather than from the command-line. By default, advanced options are not displayed in ./pants help.
  • fingerprint: Indicates that the value of the registered option affects the products of the task, such that changing the option would result in different products. When True, changing the option will cause targets built by the task to be invalidated and rebuilt.

GroupTask

Tasks may be grouped together under a parent GroupTask. Specifically, the JVM compile tasks:

jvm_compile = GroupTask.named(
'jvm-compilers',
product_type=['compile_classpath', 'classes_by_source'],
flag_namespace=['compile'])

jvm_compile.add_member(AptCompile)
jvm_compile.add_member(ZincCompile)

A GroupTask allows its constituent tasks to 'claim' targets for processing, and can iterate between those tasks until all work is done. This allows, e.g., Java code to depend on Scala code which itself depends on some other Java code.

JVM Tool Bootstrapping

If you want to integrate an existing JVM-based tool with a pants task, Pants must be able to bootstrap it. That is, a running Pants will need to fetch the tool and create a classpath with which to run it.

Your job as a task developer is to set up the arguments passed to your tool (e.g.: source file names to compile) and do something useful after the tool has run. For example, a code generation tool would identify targets that own IDL sources, pass those sources as arguments to the code generator, create targets of the correct type to own generated sources, and mutate the targets graph rewriting dependencies on targets owning IDL sources to point at targets that own the generated code.

The Scalastyle tool enforces style policies for scala code. The Pants Scalastyle task shows some useful idioms for JVM tasks.

  • Inherit NailgunTask to avoid starting up a new JVM.
  • Specify the tool executable as a Pants jar; Pants knows how to download and run those.
  • Let organizations/users override the jar in pants.ini; it makes it easy to use/test a new version.

Enabling Artifact Caching For Tasks

  • Artifacts are the output created when a task processes a target.

    • For example, the artifacts of a JavaCompile task on a target java_library would be the .class files produced by compiling the library.
  • VersionedTarget (VT) is a wrapper around a Pants target that is used to determine whether a task should do work for that target.

    • If a task determines the VT has already been processed, that VT is considered "valid".

Automatic caching

Pants offers automatic caching of a target's output directory. Automatic caching works by assigning a results directory to each VT of the InvalidationCheck yielded by Task->invalidated.

A task operating on a given VT should place the resulting artifacts in the VT's results_dir. After exiting the invalidated context block, these artifacts will be automatically uploaded to the artifact cache.

This interface for caching is disabled by default. To enable, override Task->cache_target_dirs to return True.

VersionedTargetSets

By default, Pants caching operates on (VT, artifacts) pairs. But certain tasks may need to write multiple targets and their artifacts together under a single cache key. For example, Ivy resolution, where the set of resolved 3rd party dependencies is a property of all targets taken together, not of each target individually.

To implement caching for groupings of targets, you can override the check_artifact_cache_for method in your task to check for the collected VersionedTargets as a VersionedTargetSet:

def check_artifact_cache_for(self, invalidation_check):
  return [VersionedTargetSet.from_versioned_targets(invalidation_check.all_vts)]

Manual caching

Pants allows more fine grained cache management, although it then becomes the responsibility of the task developer to manually upload VT / artifact pairs to the cache. Here is a template for how manual caching might be implemented:

def execute(self):
  targets = self.context.targets(lambda t: isinstance(t, YourTarget)):
  with self.invalidated(targets) as invalidation_check:

    # Run your task over the invalid vts and cache the output.
    for vt in invalidation_check.invalids_vts:
      output_location = do_some_work()
      if self.artifact_cache_writes_enabled():
        self.update_artifact_cache((vt, [output_location]))

Recording Target Specific Data

If you would like to track target information such as the targets being run, their run times, or some other target-specific piece of data, run_tracker provides this ability via the report_target_info method. The data reported will be stored in the run_info JSON blob along with timestamp, run id, etc.

There are various reasons you might want to collect target information. The information could be used for things like tracking developer behavior (for example, inferring what code developers are constantly changing by observing which targets are run most often) or target heath (for example, a historical look at targets and their flakiness).

Generated by publish_docs from dist/markdown/html/src/docs/dev_tasks.html 2017-10-20T21:24:31.790484