sbt plugin

The Hydra sbt plugin smoothly integrates Hydra into sbt. What the plugins does is to override the sbt compile task to compile your Scala sources with Hydra instead of the vanilla Scala compiler. From your perspective, the user, the only noticeable difference when using the Hydra sbt plugin is that your project's Scala sources are compiled faster.

This section covers the Hydra sbt plugin's functionalities. Read here if you are looking for instructions on how to get started.

How does it work?

By adding the sbt-hydra plugin to your build, the auto plugin HydraPlugin is automatically enabled in all Scala projects defined in the build.sbt. This entails that no changes are required in your build file to work with Hydra.

As mentioned before, compilation with Hydra happens in parallel. To do so, Hydra spawns a number of workers equal to the number of available physical cores and, by default, it reaches optimal speedup by automatically tuning each workers' workload. In addition to parallelizing compilation, Hydra also collects and pushes compile-time metrics to a web-based dashboard.

Let's explore next the configuration keys offered by the HydraPlugin.

Configuration

The HydraPlugin provides a number of configuration keys that can be used for fine-tuning how Hydra works.

Settings and tasks:

  • hydraWorkers: Number of workers to use for compiling a project's Scala sources (defaults to the machine's number of physical cores).
  • hydraSourcePartitioner: The source partitioner to use on a project: "auto", "explicit", "package", "plain" ("auto" is the default).
  • hydraPartitionFile: The file used by the explicit partitioner to split sources to workers.
  • hydraTimingsFile : The CSV file where Hydra will log compilation times. (Deprecated. Use the dashboard to inspect compilation times)
  • hydraScalaVersion: The version of Scala Hydra used to compile Scala sources.
  • hydraIsEnabled: Flag controlling if Hydra is used to compile a project.
  • hydraMetricsServiceStart: Starts the metrics service that pushes compilation metrics to the dashboard (note that this task is automatically triggered when entering the interactive shell).
  • hydraMetricsServiceJvmOptions: JVM options used to initialize the metrics service (defaults to Seq("-Xmx256M")).
  • hydraInvalidateCaches: A task that deletes all information gathered from previous builds. Next compilation will start fresh.
  • hydraBaseDirectory: The directory where Hydra can write its log, caches and other book-keeping files. By default it is .hydra/sbt, inside the base directory
  • hydraCheckForUpdatesEnabled: A flag controlling if availability of Hydra updates should be checked.
  • hydraDisplayInfo: A flag that controls whether Hydra shows compilation statistics on startup and when certain milestones are hit.

hydraMetricsServiceJvmOptions is a build-level key. All other keys are scoped to the project, meaning you can provide a different value for each of them on each project. The ones that are relevant for optimizing the execution of Hydra are hydraWorkers and hydraSourcePartitioner.

.scala build files

To import these settings in a .scala build file, add this import:

import com.triplequote.sbt.hydra.HydraPlugin.autoImport._

Commands

Hydra provides a few commands for your convenience. sbt commands live outside the scoping system:

  • hydraBenchmark: Compile all projects with Hydra and vanilla Scala and report speedup values.
  • hydraCompilationStats: A command to show various statistics related to compilation time, including saved time. It will pick up specific speedups per project from the most recent hydraBenchmark run (if none, it defaults to an estimated 50% speedup).
  • hydraActivateLicense and hydraDeactivateLicense: Manage your Hydra license. See License for more details
  • hydraCheckForUpdates: Check if there is a new version of Hydra. Usually performed on startup
  • hydraStartLocalDashboard: Start a local Dashboard server. Requires Docker.

Degree of parallelism

The degree of parallelism can be controlled via the key hydraWorkers. By default hydraWorkers is equal to the number of physical cores on your system (half the number of cores reported by Java, to account for hyper-threading). You can easily change this as any other sbt setting:

set every hydraWorkers := 8

Warning

Note that all Hydra settings are project-level settings, so assigning a new value using in Global won't work. If you want to set a new value for all projects, use set every in the sbt shell.

Depending on your hardware architecture you may obtain faster compile time by assigning a different value to it. However, you should never assign to it a value smaller than 2 or bigger than the number of available CPUs.

Sources partitioning

hydraSourcePartitioner controls how sources are partitioned and assigned to workers. Four strategies are available:

  • "auto": Automatically balances workers based on compilation times of individual sources. This is the default strategy.
  • "explicit": Partition sources according to an explicit partition file.
  • "package": Partition sources respecting package boundaries. This may not balance perfectly between workers, but it may lead to less "cross-talk" between workers.
  • "plain": Tries hard to assign an equal number of sources to each worker. This works well when each of your sources takes similar time to compile.

The default "auto" partition strategy will usually deliver optimal results. Read the Tuning section for a more in-depth discussion about partition strategies.

Partition file

If you are using the "explicit" partition strategy, you can use hydraPartitionFile to tell Hydra from where to read the partition file. This setting is scoped per project and configuration, so a different file for each sub-project and each configuration. For more details, please read the Tuning section.

hydraPartitionFile must be scoped to a configuration. If you just use hydraPartitionFile := <path> the setting is ignored. Make sure to always add hydraPartitionFile in Compile := <path> or hydraPartitionFile in Test := <path> when modifying it.

Invalidate caches

Hydra "learns" about your project at each compilation event and uses this information to compile faster in the future. For example, Hydra measures how long each file takes to compile, and uses this information to automatically balance the workload of each worker. In case you need to start "fresh", you can run this task to remove all Hydra data.

> hydraInvalidateCaches
[info] Deleting /Users/dragos/sandbox/unused/.hydra/sbt/core/compile
[info] Deleting /Users/dragos/sandbox/unused/.hydra/sbt/frontend/compile

This task only removes Hydra-specific data. No classfiles are removed, so running compile right after would not cause a full build.

Warning

This task is scoped to the project and configuration. If you want to remove the caches for tests you'd need to run test:hydraInvalidateCaches.

The .hydra directory

You'll notice that Hydra creates a .hydra subdirectory in your project root. This directory contains information about each compiled project, metrics files and hydra.log. You can control where this directory is placed by setting hydraBaseDirectory in ThisBuild.

Note

You should persist this directory between CI builds in order to get the best performance.

Metrics service

The Metrics service pushes compilation metrics to the dashboard after each successful full compile cycle. The Metrics service is automatically started when entering the interactive shell (during onLoad) and it's run as an external process. In particular, the metrics service process will survive even if you quit the sbt interactive shell.

The hydraMetricsServiceStart task allows you to explicitly start the metrics service, but you will rarely need this unless the Metrics service was manually stopped.

Timings file (deprecated)

This feature is deprecated and you should use the dashboard for analyzing compilation time.

Hydra can append a line in a CSV file each time it builds, making it easier to see how much time is spent actually compiling over a period of time. The file will look like the following:

Time, Tag, Workers, Files, Duration (ms)
2017/05/12 11:32:01,       specs2-core/compile,   4,    99,         11810
2017/05/12 11:32:13,          specs2-core/test,   4,   103,         12712

The format should be directly importable in any spreadsheet software.

By default, Hydra writes all measurements to .hydra/<build-tool>/timings.csv in the base directory of your build, regardless of what sub-project it compiles. If you want to change the file to a different name but still use the same file for all projects in your build you could do something like the following:

// global setting, not a project setting
hydraTimingsFile := Some((baseDirectory in ThisBuild).value / "measurements.csv")

You can take advantage of sbt scoping rules to set up different CSV files per project. For example, you could set hydraTimingsFile at the project level:

lazy val myProject = (project in file("."))
  .settings(
    hydraTimingsFile := Some(baseDirectory.value / "measurements.csv")
  )

Note that baseDirectory is used without a scope so it will pick up the project-level value.

To disable this feature set the value to None:

hydraTimingsFile := None

Hydra Scala version

The key hydraScalaVersion controls the Hydra Scala version to use. By default, its value is automatically determined from the version of sbt-hydra you are using. Our recommendation is to not touch this key, but rather upgrade the version of sbt-hydra to use the latest and greatest Hydra Scala.

Disabling Hydra

The hydraIsEnabled key allows to disable Hydra. As you might know, sbt already provides API for disabling a plugin, and we recommend you to use disablePlugins(HydraPlugin) if you decide to disable Hydra on a project.

So, why having an additional key for the same purpose? It's because it allows us to disable Hydra on projects that use a major version of Scala we don't support. For instance, if you have a multiple subprojects build, and some of your subprojects use Scala 2.10, the Hydra sbt plugin will compile all these Scala 2.10 subprojects using the vanilla Scala compiler, without you having to explicitly disable Hydra on these subprojects.

Disabling Hydra statistics

By default, Hydra will greet you with a message showing some statistics about compilation:

Hydra wishes you a productive day!

Your average compilation time is  17s, you build on average 39.50 times per day and the average number of files per build is 30.82.

All time:
    Saved:  45 min 58s out of  1 hour 31 min 57s

Type `hydraCompilationStats` to see more statistics about your compilation habits!

You can disable the automatic startup message, as well as messages when certain milestones are reached (i.e. "30 min saved today"), this by setting hydraDisplayInfo to false.

The estimated time savings are based on the most recent run of hydraBenchmark, and each project and configuration is using the corresponding speedup number. If there wasn't any benchmark run yet, it will use an estimated speedup of 2.0x. These numbers will be recalculated using the actual speedup when hydraBenchmark is run.

IntegrationTest + Hydra

The sbt IntegrationTest configuration is used to define a project containing integration tests. To set up Hydra on your integration test project simply append inConfig(IntegrationTest)(HydraPlugin.hydraConfigSettings) to the project settings:

lazy val myIntegrationTestProject = (project in file("."))
  .configs(IntegrationTest)
  .settings(
    Defaults.itSettings,
    // other settings here
  )
  .settings(inConfig(IntegrationTest)(HydraPlugin.hydraConfigSettings)) // always the last set of settings

To access HydraPlugin in a .scala build file you will need to import com.triplequote.sbt.hydra.HydraPlugin.

Logging

Hydra outputs a log file hydra.log inside the .hydra/<build-tool> folder located in the project's root directory. By default, the log level is set to INFO. But you can change the log level via the hydra.logLevel environment variable. Next is an example showing how to set the log level to DEBUG

$ sbt -Dhydra.logLevel=DEBUG

You can also change the log filename via hydra.logFile:

$ sbt -Dhydra.logLevel=DEBUG -Dhydra.logFile=myfile.log

Note that if you try to put the log file under target/ you will lose it after a clean and there will be no logging from Hydra until you restart sbt

Concurrent Restrictions

sbt allows you to restrict task concurrency via the concurrentRestrictions setting (read here the related sbt documentation). By default, the sbt-hydra plugin adds Tags.limit(HydraTag, EvaluateTask.SystemProcessors) to the default concurrent restrictions provided by sbt. This is done so that the modules that are compiled in parallel with Hydra is capped. The goals are to maximize locality, and prevent high memory consumption that can lead to long GC, which in our experience it usually delivers the best compile time results. If in your project you have modified the value assigned to concurrentRestrictions, make sure that it still contains an entry to limit compilation with Hydra. To check this, just type show concurrentRestrictions in the sbt shell:

$ sbt
...
> show concurrentRestrictions
[info] * Limit all to 8
[info] * Limit forked-test-group to 1
[info] * Limit hydra to 8

Notice that the outputted value depends on the number of both physical and logical CPUs of your machine.

If "Limit hydra to 8" is part of the output, you are good and there is no need for you to read further. Otherwise, it's possible that you are overriding the value set for concurrentRestrictions in your build. You can check if this is the case by grepping for concurrentRestrictions := in your project's build files.

1) If you find a hit, and your intention was to add a custom restriction to the default concurrentRestrictions (but without overwriting the defaults), replace concurrentRestrictions := with concurrentRestrictions ++=. reload your build and concurrentRestrictions should now include the expected limit for Hydra.

2) If you find a hit, and your intention was indeed to overwrite the default concurrentRestrictions, then add Tags.limit(HydraTag, 4) to the specified restrictions. For instance:

concurrentRestrictions in Global := Seq(
  ... // your custom restrictions
  Tags.limit(HydraTag, 4)
)

3) If you don't find a hit, then it's possible that one of the sbt plugins you are using is overwriting concurrentRestrictions. In this case you will need to overwrite concurrentRestrictions on your turn, and explicitly provide the restriction for the HydraTag tag. Here is how you can restore the default sbt concurrentRestrictions and at the same time limiting compilation with Hydra:

concurrentRestrictions in Global := Tags.limit(HydraTag, EvaluateTask.SystemProcessors) +: Defaults.defaultRestrictions.value

Parallel execution

If task parallelExecution is enabled (which is the default in sbt) and in your build you have many subprojects that can be compiled independently, finding the optimal limit for the HydraTag tag may require some experiment. As a rule of thumb, we recommend that it never exceeds the number of cores available on your machine (typically 12 on modern laptops). For instance, if you'd like to force your projects to be compiled sequentially (as this might improve memory locality), add the following setting to your build:

concurrentRestrictions in Global := Tags.limit(HydraTag, hydraDefaultCpus) +: Defaults.defaultRestrictions.value

If you have many sub-projects that can be compiled in parallel you may find an optimal result if you combine the two approaches. For instance, you may decide to use 4 workers for projects that are at the bottom of the dependency tree and two workers for the leaves, while at the same time restricting overall parallelism to 8. This will allow up to 4 leaf projects to be compiled in parallel by sbt, while each one in turn is parallelized by Hydra on two cores.