The Hydra sbt plugin smoothly integrates Hydra into Sbt. What the plugins does is to override the sbt
compile task to compile your Scala sources with Hydra instead of the vanilla Scala compiler. From your perspective, the user, the only noticeable difference when using the Hydra sbt plugin is that your project's Scala sources are compiled faster.
This section covers the Hydra sbt plugin's functionalities. Read here if you are looking for instructions on how to get started.
How does it work?¶
By adding the
sbt-hydra plugin to your build, the auto plugin
HydraPlugin is automatically enabled in all Scala projects defined in the
build.sbt. This entails that no changes are required in your build file to work with Hydra.
As mentioned before, compilation with Hydra happens in parallel. To do so, Hydra spawns a number of workers equal to the number of available physical cores and, by default, it reaches optimal speedup by automatically tuning each workers' workload. In addition to parallelizing compilation, Hydra also collects and pushes compile-time metrics to a web-based dashboard.
Let's explore next the configuration keys offered by the
HydraPlugin provides a number of configuration keys that can be used for fine-tuning how Hydra works.
hydraWorkers: Number of workers to use for compiling a project's Scala sources (defaults to the machine's number of physical cores).
hydraSourcePartitioner: The source partitioner to use on a project:
"auto"is the default).
hydraPartitionFile: The file used by the explicit partitioner to split sources to workers.
hydraTimingsFile: The CSV file where Hydra will log compilation times. (Deprecated. Use the dashboard to inspect compilation times)
hydraScalaVersion: The version of Scala Hydra used to compile Scala sources.
hydraIsEnabled: Flag controlling if Hydra is used to compile a project.
hydraMetricsServiceStart: Starts the metrics service that pushes compilation metrics to the dashboard (note that this task is automatically triggered when entering the interactive shell).
hydraDashboardServerURL: The URL used by the metrics service to push compilation metrics (defaults to
hydraMetricsServiceJvmOptions: JVM options used to initialize the metrics service (defaults to
hydraInvalidateCaches: A task that deletes all information gathered from previous builds. Next compilation will start fresh.
hydraBaseDirectory: The directory where Hydra can write its log, caches and other book-keeping files. By default it is
.hydra/sbt, inside the base directory
hydraMetricsServiceJvmOptions are build-level keys. While, all other keys are scoped to the project, and hence you can provide a different value for each of them on each project. The ones that are relevant for optimizing the execution of Hydra are
Degree of parallelism¶
The degree of parallelism can be controlled via the key
hydraWorkers. By default
hydraWorkers is equal to the number of physical cores on your system (half the number of cores reported by Java, to account for hyper-threading). You can easily change this as any other Sbt setting:
set every hydraWorkers := 8
Note that all Hydra settings are project-level settings, so assigning a new value using
in Global won't work. If you want to set a new value for all projects, use
set every in the Sbt shell.
Depending on your hardware architecture you may obtain faster compile time by assigning a different value to it. However, you should never assign to it a value smaller than 2 or bigger than the number of available CPUs.
hydraSourcePartitioner controls how sources are partitioned and assigned to workers. Four strategies are available:
"auto": Automatically balances workers based on compilation times of individual sources. This is the default strategy.
"explicit": Partition sources according to an explicit partition file.
"package": Partition sources respecting package boundaries. This may not balance perfectly between workers, but it may lead to less "cross-talk" between workers.
"plain": Tries hard to assign an equal number of sources to each worker. This works well when each of your sources takes similar time to compile.
"auto" partition strategy will usually deliver optimal results. Read the Tuning section for a more in-depth discussion about partition strategies.
If you are using the
"explicit" partition strategy, you can use
hydraPartitionFile to tell Hydra from where to read the partition file. This setting is scoped per project and configuration, so a different file for each sub-project and each configuration. For more details, please read the Tuning section.
hydraPartitionFile must be scoped to a configuration. If you just use
hydraPartitionFile := <path> the setting is ignored. Make sure to always add
hydraPartitionFile in Compile := <path> or
hydraPartitionFile in Test := <path> when modifying it.
Hydra "learns" about your project at each compilation event and uses this information to compile faster in the future. For example, Hydra measures how long each file takes to compile, and uses this information to automatically balance the workload of each worker. In case you need to start "fresh", you can run this task to remove all Hydra data.
> hydraInvalidateCaches [info] Deleting /Users/dragos/sandbox/unused/.hydra/sbt/core/compile [info] Deleting /Users/dragos/sandbox/unused/.hydra/sbt/frontend/compile
This task only removes Hydra-specific data. No classfiles are removed, so running
compile right after would not cause a full build.
This task is scoped to the project and configuration. If you want to remove the caches for tests you'd need to run
The .hydra directory¶
You'll notice that Hydra creates a .hydra subdirectory in your project root. This directory contains information about each compiled project, metrics files and
hydra.log. You can control where this directory is placed by setting
hydraBaseDirectory in ThisBuild.
You should persist this directory between CI builds in order to get the best performance.
The Metrics service pushes compilation metrics to the dashboard after each successful full compile cycle. The Metrics service is automatically started when entering the interactive shell (during
onLoad) and it's run as an external process. In particular, the metrics service process will survive even if you quit the sbt interactive shell.
hydraMetricsServiceStart task allows you to explicitly start the metrics service, but you will rarely need this unless the Metrics service was manually stopped.
Timings file (deprecated)¶
This feature is deprecated and you should use the dashboard for analyzing compilation time.
Hydra can append a line in a CSV file each time it builds, making it easier to see how much time is spent actually compiling over a period of time. The file will look like the following:
Time, Tag, Workers, Files, Duration (ms) 2017/05/12 11:32:01, specs2-core/compile, 4, 99, 11810 2017/05/12 11:32:13, specs2-core/test, 4, 103, 12712
The format should be directly importable in any spreadsheet software.
By default, Hydra writes all measurements to
.hydra/<build-tool>/timings.csv in the base directory of your build, regardless of what sub-project it compiles. If you want to change the file to a different name but still use the same file for all projects in your build you could do something like the following:
// global setting, not a project setting hydraTimingsFile := Some((baseDirectory in ThisBuild).value / "measurements.csv")
You can take advantage of Sbt scoping rules to set up different CSV files per project. For example, you could set
hydraTimingsFile at the project level:
lazy val myProject = (project in file(".")) .settings( hydraTimingsFile := Some(baseDirectory.value / "measurements.csv") )
baseDirectoryis used without a scope so it will pick up the project-level value.
To disable this feature set the value to
hydraTimingsFile := None
Hydra Scala version¶
hydraScalaVersion controls the Hydra Scala version to use. By default, its value is automatically determined from the version of
sbt-hydra you are using. Our recommendation is to not touch this key, but rather upgrade the version of
sbt-hydra to use the latest and greatest Hydra Scala.
hydraIsEnabled key allows to disable Hydra. As you might know, sbt already provides API for disabling a plugin, and we recommend you to use
disablePlugins(HydraPlugin) if you decide to disable Hydra on a project.
So, why having an additional key for the same purpose? It's because it allows us to disable Hydra on projects that use a major version of Scala we don't support. For instance, if you have a multiple subprojects build, and some of your subprojects use Scala 2.10, the Hydra sbt plugin will compile all these Scala 2.10 subprojects using the vanilla Scala compiler, without you having to explicitly disable Hydra on these subprojects.
IntegrationTest + Hydra¶
IntegrationTest configuration is used to define a project containing integration tests. To set up Hydra on your integration test project simply append
inConfig(IntegrationTest)(HydraPlugin.hydraConfigSettings) to the project settings:
lazy val myIntegrationTestProject = (project in file(".")) .configs(IntegrationTest) .settings( Defaults.itSettings, // other settings here ) .settings(inConfig(IntegrationTest)(HydraPlugin.hydraConfigSettings)) // always the last set of settings
HydraPluginin a .scala build file you will need to
Hydra outputs a log file
hydra.log inside the
.hydra/<build-tool> folder located in the project's root directory. By default, the log level is set to
INFO. But you can change the log level via the
hydra.logLevel environment variable. Next is an example showing how to set the log level to
$ sbt -Dhydra.logLevel=DEBUG
You can also change the log filename via
$ sbt -Dhydra.logLevel=DEBUG -Dhydra.logFile=myfile.log
Note that if you try to put the log file under
target/you will lose it after a
cleanand there will be no logging from Hydra until you restart
Sbt allows you to restrict task concurrency via the
concurrentRestrictions setting (read here the related sbt documentation). By default, the sbt-hydra plugin adds
Tags.limit(HydraTag, <defaultWorkers>) to the default concurrent restrictions provided by sbt. This is done so that projects are compiled with Hydra one after the other, as it maximize locality, and in our experience it usually delivers the best compile time results. If in your project you have modified the value assigned to
concurrentRestrictions, make sure that it still contains an entry to limit compilation with Hydra. To check this, just type
show concurrentRestrictions in the sbt shell:
$ sbt ... > show concurrentRestrictions [info] * Limit all to 8 [info] * Limit forked-test-group to 1 [info] * Limit hydra to 4
If "Limit hydra to 4" is part of the output, you are good and there is no need for you to read further. Otherwise, it's possible that you are overriding the value set for
concurrentRestrictions in your build. You can check if this is the case by grepping for
concurrentRestrictions := in your project's build files.
1) If you find a hit, and your intention was to add a custom restriction to the default
concurrentRestrictions (but without overwriting the defaults), replace
concurrentRestrictions := with
reload your build and
concurrentRestrictions should now include the expected limit for Hydra.
2) If you find a hit, and your intention was indeed to overwrite the default
concurrentRestrictions, then add
Tags.limit(HydraTag, hydraDefaultCpus) to the specified restrictions. For instance:
concurrentRestrictions in Global := Seq( ... // your custom restrictions Tags.limit(HydraTag, hydraDefaultCpus) )
The default limit is set to the number of physical cores (ignoring hyper-threading), which is equal to the number of default hydraWorkers.
3) If you don't find a hit, then it's possible that one of the sbt plugins you are using is overwriting
concurrentRestrictions. In this case you will need to overwrite
concurrentRestrictions on your turn, and explicitly provide the restriction for the
HydraTag tag. Here is how you can restore the default sbt
concurrentRestrictions and at the same time limiting compilation with
concurrentRestrictions in Global := Tags.limit(HydraTag, hydraDefaultCpus) +: Defaults.defaultRestrictions.value
parallelExecution is enabled (which is the default in sbt) and in your build you have many subprojects that can be compiled independently, finding the optimal limit for the
HydraTag tag may require some experiment. As a rule of thumb, we recommend that it never exceeds the number of logical cpu cores available on your machine (typically 8 on modern laptops). As we have seen above, the limit is set to
4 by default. To override the default and set the
HydraTag limit to
8 (hence, allowing two Hydra compilation with 4 workers each to run concurrently), simply add the following in your build:
concurrentRestrictions in Global := Tags.limit(HydraTag, 8) +: Defaults.defaultRestrictions.value
In this setting sbt may decide to compile projects in parallel, but the total number of Hydra workers across a full compilation will not exceed 8.
If you have many sub-projects that can be compiled in parallel you may find an optimal result if you combine the two approaches. For instance, you may decide to use 4 workers for projects that are at the bottom of the dependency tree and two workers for the leaves, while at the same time restricting overall parallelism to 8. This will allow up to 4 leaf projects to be compiled in parallel by sbt, while each one in turn is parallelized by Hydra on two cores.