<img height="1" width="1" src="https://www.facebook.com/tr?id=1076094119157733&amp;ev=PageView &amp;noscript=1">

Using Shapeless to Validate Typesafe Configuration Data

Posted by Carl Pulley on Sat, Feb 11, 2017

In this blog post we explain how Typesafe configuration data may be validated. To avoid unnecessary boilerplate code, we use Shapeless to aid the creation of a lightweight validation DSL. We finish off the post by showing how the use of sealed abstract classes can be leveraged to enforce validation constraints to be invariant.

Overview of the Typesafe Config Library

Typesafe config is a popular and common format by which application configuration data may be specified. Configuration data is organised as a series of files containing HOCON formatted data. This allows configuration values (e.g. timeout values, hostnames, ports, etc.) to be stored against paths (c.f. keys). These files may then be loaded and parsed into a set of path/value bindings using the Typesafe config library. Using include statements, configuration files may include other configuration files - thus allowing configuration data to be structured based on, for example, usage or library.

Differing environments often require different configuration settings specific to those environments. So, for example, local developer environments might require the path interface.http.hostname to be localhost, whilst staging and production environments require that the same path be some routable IP address. One common approach to this issue is to define environment specific configuration files. However, doing this can quickly lead to divergence in the configuration values that are available in each environment, and so produces a maintenance burden.

Typesafe config allows paths values to be overriden by latter defined path values. If a latter defined path value is null, then this is typically ignored (assuming that the path has previsouly been assigned a value). Moreover, Typesafe config allows environment variables to be used as value expressions in configuration files. The idea being that if the environment variable is defined, then that value will take presedence. So, by allowing different environments to set environment variables appropriately, we may keep one common set of configuration files for all deployment environments! With careful file structuring, we can even factor all environment variables into a single file - c.f. your documented interface to your DevOps team.

Fail Early, Fail Hard

When building components for use in distributed architectures, it is absolutely essential that these deployed components should fail early, thus signaling the existence of some underlying issue. This is particularly true for data that is used to configure the application. So, what sort of critical problems might we encounter when working with Typesafe configuration data files?

Since Typesafe config allows the use of include statements, we might encounter issues at runtime with files that are included but do not exist. Fortunately, since version 1.3.0 of Typesafe config, it is possible to wrap the included file name in a required and so ensure that Typesafe config throws an exception should the file not be present.

Often, following the parsing and loading of the Typesafe config files, we place this configuration data into a structured collection of case classes. In doing this, we may perform additional validation of the configuration data (e.g. to ensure that port values are in given ranges, IP address strings match a given format, etc.) or require that values exist at given paths (e.g. to avoid case classes with null members). The Typesafe config library offers no tools for performing this type of validation.

Moreover, once we have validated the loaded configuration data and managed to transform it into a case class, we may still encounter issues related to data forgery. Scala case classes provide a simple and convenient way of defining abstract syntax trees (ASTs). However, they also define a number of auxillary constructors, methods and factory functions that permit the case class (once created) to be modified. If this is allowed, then all that validated configuration data is meaningless!

Validated Configuration Data

validated-config is a library that provides a lightweight DSL for specifying how to build and validate case class instances from Typesafe config files. In order to demonstrate this library, let us image the following set of configuration files:

# application.conf
name = "test-data"
include required("env")
include required("http")

Here, application.conf will be our top level configuration file from which all other configuration data is included. Notice how the use of the Typesafe config required function here forces the files env.conf and http.conf to be present for a successfull load.

# env.conf
env {
  required {
    HEARTBEAT = "NOT_SET"
    HEARTBEAT = ${?HEARTBEAT}
  }
  optional {
    HTTP_ADDR = ${?HTTP_ADDR}
    HTTP_PORT = ${?HTTP_PORT}
  }
}

Notice how all our environment variables have been placed into the single configuration file env.conf. Good practice is to not provide any default values within this file. The one exception to this guideline is with paths whose values must be set (i.e. are required).

# http.conf
http {
  host = "localhost"
  host = ${?env.optional.HTTP_ADDR}
  port = 80
  port = ${?env.optional.HTTP_PORT}

  heartbeat = ${?env.required.HEARTBEAT}
}

Our aim is take these Typesafe configuration files and, following successful validation, produce an instance of the case class Settings:

final case class HttpConfig(host: String, port: Int)
final case class Settings(name: String, heartbeat: FiniteDuration, http: HttpConfig)

validated-config provides the functions:

  • validatedConfig for specifying the Typesafe config that we wish to load and validate
  • and buildUnsafe for grouping nested validation statements and specifying the case class that is to hold the successfully validated data (the reason for the Unsafe naming postfix will become clear shortly).

These functions are used as follows:

val settings: Try[Settings] = validateConfig("application.conf") { implicit config =>
  buildUnsafe[Settings](
    // TODO: name, heartbeat and http validation
    ???,
    ???,
    ???
  )
}

To define and validate the nested HttpConfig case class, we can use the via function to change the Config instance scope (c.f. calling getConfig) and then build a validated HttpConfig instance:

val settings: Try[Settings] = validateConfig("application.conf") { implicit config =>
  buildUnsafe[Settings](
    // TODO: name and heartbeat validation
    ???,
    ???,
    via("http") { implicit config =>
      buildUnsafe[HttpConfig](
        // TODO: host and port validation
        ???,
        ???
      )
    }
  )
}

Note that it is very important that all Config implicits are named the same - doing this allows the implicit scope to be updated. If we were to use different names for the nested implicits then, because they all have the same type and are (effectively) all in scope, we would get a compile time error regarding ambiguous implicit values!

The Typesafe Config instance that results from loading application.conf is held in the outer implicit. The TODO marks where we need to add our validation code. The Settings case class has three members, and each of these may be validated in differing ways. For example:

  • the name parameter might require that the string value matches the regular expression "[a-z0-9_-]+"
  • the heartbeat parameter might be unchecked (e.g. because it is expected to be formatted as a duration), but should have a value defined
  • the port parameter of the HttpConfig case class should be non-negative.

For paths that need to be validated, validated-config provides the functions validate, unchecked and required. We can use these functions as follows:

val settings: Try[Settings] = validateConfig("application.conf") { implicit config =>
  buildUnsafe[Settings](
    validate[String]("name", NameShouldBeNonEmptyAndLowerCase)(_.matches("[a-z0-9_-]+")),
    unchecked(required("http.heartbeat", "NOT_SET")),
    via("http") { implicit config =>
      buildUnsafe[HttpConfig](
        unchecked[String]("host"),
        validate[Int]("port", ShouldNotBeNegative)(_ >= 0)
      )
    }
  )
}

The validation exceptions here are defined by:

case object NameShouldBeNonEmptyAndLowerCase extends Exception
case object ShouldNotBeNegative extends Exception

When working with paths that must have a value defined (i.e. they are required values), there are two implementation possibilities:

  • the Typesafe config path must exist and the value stored there should be non-null
  • or, the Typesafe config path should have a non-sentinal value assigned to it.

In both cases, we achieve this using the required function. The first parameter to required is the path at which we require to have a defined value. If no second parameter is specified, then we expect a non-null value at that path. Should a second (sentinal) parameter be specified, then we expect the value at that path to be defined and distinct from this sentinal value.

In cases where 3rd party libraries may already provide default or reference values for configuration paths, then it can often be useful to use sentinal values to ensure the existence and nono-trivial setting of required values.

Shapeless Generics

Behind the scenes, Shapeless is used to convert HLists (representing parameter validation) into user validated case classes.

To see how this might be done, imagine that we have a list of functions of type (config: Config) => Either[List[ValueError], Value] - here ValueError is a specific validation failure, whilst Value is the type we wish to validate. We'll imagine one such function per case class parameter and translate the list of validation functions into the type (config: Config) => Either[List[ValueError], HList] as follows:

def buildUnsafe[ValidConfig](
  validatedParams: (Config => Either[ValueError, Any])*
)(implicit config: Config
): Either[List[ValueError], ValidConfig] = {
  val failuresHList: Either[List[ValueError], Any] =
    validatedParams.map(_.apply(config)).foldRight[(List[ValueError], HList)]((Nil, HNil)) {
      case (Left(error), (failures, result)) =>
        (error +: failures, result)
      case (Right(value), (failures, result)) =>
        (failures, value :: result)
    }

  ???
}

Then, using Shapeless generics, we can convert our HList into the ValidatedConfig case class as follows:

// FIXME: do we really need to loose parameter typing?
def buildUnsafe[ValidConfig](
  validatedParams: (Config => Either[ValueError, Any])*
)(implicit config: Config,
  gen: Generic[ValidConfig]
): Either[List[ValueError], ValidConfig] = {
  val failuresHList: (List[ValueError], HList) =
    validatedParams.map(_.apply(config)).foldRight[(List[ValueError], HList)]((Nil, HNil)) {
      case (Left(error), (failures, result)) =>
        (error +: failures, result)
      case (Right(value), (failures, result)) =>
        (failures, value :: result)
    }

  failuresHList match {
    // FIXME: the following appears to lead to unnecessary runtime errors!
    case (Nil, result: (gen.Repr @unchecked)) =>
      Right(gen.from(result))
    case (failures, _) =>
      Left(failures)
  }
}

Whilst the code for buildUnsafe is relatively simple, we loose some type safety in its implementation (hence its name!). If the developer were to specify the use of (say) validate[Double](???, ???)(???), but that parameter of our validated case class actually had type Int, then we would get a runtime class cast exception. This is clearly the sort of error that we would ideally like to catch at compile time! In the next blog post, we will examine how we may use polymorphic functions to avoid these types of issues.

For more information on how Shapeless can DRY up yourboiler plate (and other things!), have a look at the excellent The Type Astronaut's Guide to Shapeless by Dave Gurnell.

Invariant Validated Configuration Data

We have seen how we may use validated-config to define validated case class instances from Typesafe configuration data. However, the use of case classes allows the validated configuration data to be modified (e.g. by constructing new instances via the class constructor, copy constructors or companion object apply function). In doing so, any validation invariants that we would like to hold elsewhere within our code, can no longer be considered to hold. This is clearly something we would like to avoid!

@tpolecat noticed that the use of sealed abstract case classes would allow the Scala compiler to build case class niceness such as field val's, hash function, equality and unapply - without creating any constructors. This allows developers to control how instances of the sealed abstract case classes are created. For example, by package protecting the code that generates validated case class instances, we can use the compiler to ensure that our validation constraints remain invariant!

So, consider the following code:

package cakesolutions.example

import cakesolutions.config._
import scala.concurrent.duration.FiniteDuration
import scala.util.Try
import shapeless._

object InvariantConfig {
  case object NameShouldBeNonEmptyAndLowerCase extends Exception
  case object ShouldNotBeNegative extends Exception

  sealed abstract case class HttpConfig(host: String, port: Int)
  sealed abstract case class Settings(name: String, heartbeat: FiniteDuration, http: HttpConfig)

  // Following allows Shapeless to create instances of our sealed abstract case classes
  private implicit val genHttpConfig: Generic[HttpConfig] = new Generic[HttpConfig] {
    type Repr = String :: Int :: HNil

    def to(t: HttpConfig): Repr =
      t.host :: t.port :: HNil
    def from(r: Repr): HttpConfig =
      new HttpConfig(r(0), r(1)) {}
  }
  private implicit val genSettings: Generic[Settings] = new Generic[Settings] {
    type Repr = String :: FiniteDuration :: HttpConfig :: HNil

    def to(t: Settings): Repr =
      t.name :: t.heartbeat :: t.http :: HNil
    def from(r: Repr): Settings =
      new Settings(r(0), r(1), r(2)) {}
  }

  def apply(): Try[Settings] = {
    validateConfig("application.conf") { implicit config =>
      buildUnsafe[Settings]((
        validate[String]("name", NameShouldBeNonEmptyAndLowerCase)(_.matches("[a-z0-9_-]+")),
        unchecked[FiniteDuration](required("http.heartbeat", "NOT_SET")),
        via("http") { implicit config =>
          buildUnsafe[HttpConfig]((
            unchecked[String]("host"),
            validate[Int]("port", ShouldNotBeNegative)(_ >= 0)
          ))
        }
      ))
    }
  }
}

Code outside of this package may create Settings instances using calls to cakesolutions.example.InvariantConfig(). However, the created Settings and HttpConfig instances can not be mutated, transformed or faked. With some minor coding discipline (that is easily checked during peer code review) the compiler now guarantees that our validation constraints remain invariant!

Conclusion

In this blog post we have seen how we may use the open source library validated-config to build validated case classes from Typesafe configuration data. Moreover, we have also shown how the use of sealed abstract case classes, with package level access control, can be used to ensure that validation constraints remain invariant (i.e. once we create a validated case class, we can not transform the data and break those validation invariants) with compile time enforcement of those invariants. If you would like to learn more about these types of techniques, have a look at Enforcing invariants in Scala datatypes by Jaakko Pallari.

In the next post, we will examine how polymorphic functions can be used to rewrite the buildUnsafe function (as presented here), so that we avoid the unsavoury class cast exceptions at runtime - so please stay tuned!

As usual, complete code used in this blog post is open sourced and available at:

Recent Posts

Posts by Topic

see all

Subscribe to Email Updates