Shade with SBT

2016-10-15Home

What will we learn in this post ?

  • What is / Why shade ?
  • How to shade with sbt-assembly ?
  • How to add shaded library to published maven pom ?

SBT (Simple Build Tool) is a build tool for Scala and Java. We write build definition in Scala and it seems we can do anything with the flexibility of the language. It also means sometimes we have to do it ourselves. Another downside is SBT has scarce documentation on its configuration keys so we have to dig into its hidden capabilities.

As a Scala project, Apache Gearpump (incubating) has been using SBT from the beginning. In this post, I'll share our experience working on the shaded libraries / dependencies of Gearpump.

What is / Why shade ?

Gearpump uses popular libraries like guava and puts them on classpath on launch. Guava is popular so it may happen to be depended on guava by a user application but unfortunately of a different version. At runtime, two versions of guava will be on the classpath but the system guava could be loaded rather than the users' and unexpectedly breaks the application. That's where shade comes in. It allows us, for example, to rename the system guava classes from com.google.* to org.apache.gearpump.google.*. While user codes will import com.google.*, Gearpump system code will have org.apache.gearpump.google.* such that they will never be accidentally loaded into a user application.

How to shade in SBT ?

There is no available shade plugin for SBT like Maven Shade Plugin. In the old time, we shade system libraries with Maven and host them at gearpump-shaded-repo as external dependencies. When Gearpump was moving to Apache, it's required they should be hosted at Gearpump's Apache repo. To the rescue, the sbt-assembly plugin has supported shade since its 0.14.0 release. sbt-assembly is to create a fat JAR of a project with all of its dependencies (akin to Maven Assembly Plugin). In Gearpump, we usually use it to assemble example projects. It can be used like Maven Shade Plugin as well.

We create empty SBT shaded projects with the corresponding original libraries as dependencies and define ShadeRule (check out more shading rules) inside assemblyShadeRules, e.g.

val shaded = Project(
  id = "gearpump-shaded",
  base = file("shaded")
  ).aggregate(shaded_akka_kryo, shaded_gs_collections, shaded_guava, shaded_metrics_graphite)
  
lazy val shaded_guava = Project(
  id = "gearpump-shaded-guava",
  base = file("shaded/guava"),
  settings = shadeAssemblySettings ++ addArtifact(Artifact("gearpump-shaded-guava"), sbtassembly.AssemblyKeys.assembly) ++
  Seq(
    assemblyShadeRules in assembly := Seq(
      ShadeRule.rename("com.google.**" -> "org.apache.gearpump.google.@1").inAll
    )
  ) ++
  Seq(
    libraryDependencies ++= Seq(
      "com.google.guava" % "guava" % guavaVersion
    )
  )
)

Note we declare this project to publish with addArtifact(Artifact("gearpump-shaded-akka-kryo"), sbtassembly.AssemblyKeys.assembly) (Please refer to Defining custom artifacts).

With shadeAssemblySettings we add more assembly options. The assemblyJarName and target in assembly is crucial for other projects to pick up the dependency.

val shadeAssemblySettings = Build.commonSettings ++ Seq(
  scalaVersion := Build.scalaVersionNumber,
  test in assembly := {},
  assemblyOption in assembly ~= {
    _.copy(includeScala = false)
  },
  assemblyJarName in assembly := {
    s"${name.value}_$scalaVersionMajor-${version.value}.jar"
  },
  target in assembly := baseDirectory.value.getParentFile / "target" / scalaVersionMajor
)

Remember it's the assembled shaded jar that will be depended, inter-project source dependency like the following won't work.

lazy val core = Project(
  id = "gearpump-core",
  base = file("core"),
  settings = commonSettings ++ javadocSettings ++ coreDependencies
).dependsOn(shaded_guava)

SBT provides an unmanagedJars setting such that we can add an arbitrary jar to the dependencies. The jar path must match that defined for shaded project.

lazy val core = Project(
  id = "gearpump-core",
  base = file("core"),
  settings = commonSettings ++ javadocSettings ++ coreDependencies ++ Seq(
    unmanagedJars in compile ++= Seq(
      shaded.base / "target" / scalaVersionMajor / s"gearpump-shaded-guava_$scalaVersionMajor-$gearpumpVersion.jar"
    )
  )
)

Now, we can compile with sbt assembly.

How to add shaded library to published Maven pom ?

The unmanagedJar approach doesn't add the shaded project to the published Maven pom of gearpump-core project. Although being transitive dependencies, they have to be explicitly added to the build file, which means more cumbersome for users. Are we at a dead end ?

Not yet. It's time to dig into SBT configuration keys! The nugget I found is pomPostProcess which "Transforms the generated POM". We traverse the XML tree and append the shaded dependency to the <dependencies></dependencies> tags.

lazy val core = Project(
  id = "gearpump-core",
  base = file("core"),
  settings = commonSettings ++ javadocSettings ++ coreDependencies ++ Seq(
    pomPostProcess := {
        (node: xml.Node) => addShadedDeps(List(
          <dependency>
            <groupId>{organization.value}</groupId>
            <artifactId>{shaded_guava.id}</artifactId>
            <version>{version.value}</version>
          </dependency>
        ), node)
    },
  
    unmanagedJars in compile ++= Seq(
      shaded.base / "target" / scalaVersionMajor / s"gearpump-shaded-guava_$scalaVersionMajor-$gearpumpVersion.jar"
    )
  )
)
  
private def addShadedDeps(deps: Seq[xml.Node], node: xml.Node): xml.Node = {
  node match {
    case elem: xml.Elem =>
      val child = if (elem.label == "dependencies") {
        elem.child ++ deps
      } else {
        elem.child.map(addShadedDeps(deps, _))
      }
      xml.Elem(elem.prefix, elem.label, elem.attributes, elem.scope, false, child: _*)
    case _ =>
      node
  }
}

Summary

There is not too much one can find when googling for the above problems. I hope this post can become a handy guide for SBT users. Please check out Gearpump build files for a complete example.