Scala compilation metrics

In addition to compiling faster, Hydra collects metrics about where compilation time is spent. There are two kinds of metrics that we collect:

  • Project-wide metrics, on every compilation
  • Per-file compilation metrics, on full builds only

Project-wide: the timings file

This file contains buid times for all submodules in the current project, in addition to a few more data points. It's located under .hydra/sbt/timings.csv for sbt builds, and similarly for other integrations, where sbt is replaced by the tool name.

Time Tag Workers Files Duration GC Time
2018/05/01 11:39:52 core/compile 2 2 1250 200

Here's a breakdown of metrics of interest:

Metric Description
Tag Project and configuration
Workers The number of workers used during the build
Files The total number of files that were compiled
Duration Total compilation time in ms
GC Time Time spent doing garbage collection while compiling (since 0.9.9)

Note that GC Time is a JVM-wide number and can't always be directly attributed to the Scala compiler. However, large numbers give a good indication weather there is a lot of memory pressure during compilation.

Per-file: unit timings file

Our metrics are low-overhead (around 2%) and are enabled for full builds only. Metrics are saved in one file per project and configuration, for example:

.hydra/sbt/core/compile/unit-timings.csv
.hydra/sbt/core/test/unit-timings.csv

This file contains detailed compilation metrics for each file in a full build. These metrics are particularly useful since they provide an indication of where most of the time is spent, and hints at possible issues.

Worker ID File Total Time LateFile? Spans Parser nodes Typer nodes LoC LoC/s
1 core/../SparkContext.scala 1421 false 1123 8012 10172 1506 1060
0 core/../JsonProtocol.scala 1400 false 906 5748 14903 878 627

Here's a breakdown of each metric of interest

Metric Description
Worker ID On which worker was this file compiled
File The file name
Total Time The time it took to compile this file in ms.
Late file Not unused, always false
Spans The number of timing spans
Parser nodes The number of AST nodes after parsing. (since 0.9.12)
Typer nodes The number of AST nodes after type-checking (since 0.9.12)
LoC Lines of code in this file, excluding comments (since 0.9.12)
LoC/s Compilation speed in number of compiled lines of code per second (since 0.9.12)

A large number of typer nodes compared to parser nodes indicates that there is a lot of macro expansion happening. This impacts compilation times in two ways: type-checking lakes longer, and subsequent phases have a lot more code to generate.

Lines of code are a single-threaded metric. Since each file is assigned to one worker, compilation happens on a single thread. The lines of code per second shows how fast a single worker can compile one file. Typical values range between 500 LoC/s (for heavy macro or type-intensive code) and 2000 LoC/s, depending on project code style and Scala version.