<img height="1" width="1" src="https://www.facebook.com/tr?id=1076094119157733&amp;ev=PageView &amp;noscript=1">

Akka Clustering and Remoting: Application Scaling

Posted by Carl Pulley on Mon, Jun 16, 2014

Introduction

In my previous post (Akka Clustering and Remoting: The Experiment), I defined a ping-pong application and deployed it into a local JVM cluster. In this post, I want to examine how we can scale this application into a vendor's cloud (e.g. Amazon or Rackspace).

However when pushing into the cloud, it is wise to remove any reliance upon a single cloud vendor. So, I'll also look at how that may be accomplished.

Nowadays, as organisations look to make their applications resilent, there are a number of competing technologies that can be used to deploy cloud compute nodes (in a vendor agnostic manner).

I have listed a few such technologies below:

For this post, I will use JClouds and in a follow-up post, I'll explore the use of δ-cloud.

The provisioning software space offers a slightly more restricted range of choices:

As our provisioning requirements are relatively simple:

  • deploy JVM targeted applications
  • focus on application based deployments (as opposed to provisioning complex OS environments).

I will opt to use Chef (which, coincidently, JClouds also supports). I assume that you have previously registered for an Enterprise Chef Opscode account and have setup knife correctly.

Credential information will also need to be setup in resources/jclouds.conf (see resources/jclouds.conf.example for a template configuration file).

Application Modifications for Scaling

As one might hope, the actual modifications we need to make to our existing application code are quite minimal. The biggest change is related to how we provision worker nodes:

// Here we provision our cluster nodes using (behind the scenes) JClouds/Chef
def provisionNode(label: String): Unit = {
  val node = supplier match {
    case Rackspace =>
      new RackspaceProvisioner(label, joinAddress)
    case Amazon =>
      new AmazonProvisioner(label, joinAddress)
  }
  val metadata = node.bootstrap
    
  machines = machines + (node -> metadata)
}

Deployment with JClouds

In order to interact with JClouds, I use class based inherritance so that the separate concerns of OS flavours, cloud vendor seasoning and client compute node salting can be addressed. The following class diagram illustrates this:

Class-Diagram

For the most part, the Image abstract class defines how to provision compute nodes (in a vendor cloud) and then defines how to bootstrap those nodes (e.g. using a Chef client):

// Child classes update this property to configure the compute instance that will be provided
protected[this] val template_builder: TemplateBuilder = client.templateBuilder()
  
// Child classes use this property to define the compute nodes administrator login name and credentials
protected[this] val admin: LoginCredentials // Usage: 'override lazy val' in children




// Used to provision, bootstrap and configure a specific provider instance
def bootstrap(): NodeMetadata = {
  val template = template_builder.options(template_builder.build.getOptions.inboundPorts(ports.toArray : _*)).build()




  // ...




  val bootstrap_node = ???
  // We only provision single machine instances here (though it is possible to provision more at the same time!)
  node = Some(client.createNodesInGroup(group, 1, template).head)
  client.runScriptOnNode(node.get.getId(), bootstrap_node, overrideLoginCredentials(admin))
  node.get
}

Provisioning with Chef

Provisioning a compute node boils down to using the bootstrap script to install and utilise a Chef client provisioner. To facilitate the use of Chef, I define mutable datastructures that allow the actual Chef runlist and attributes used to be redefined and modified in child classes:

// Child classes update these properties with the Chef recipes that are to be ran
protected[this] val chef_runlist: RunListBuilder = new RunListBuilder()
protected[this] val chef_attributes = mutable.Map[String, JObject]()




// Used to provision, bootstrap and configure a specific provider instance
def bootstrap(): NodeMetadata = {
  // ...




  val chef_attrs: Map[String, JObject] = chef_attributes.toMap
  val chef_config = BootstrapConfig.builder().runList(chef_runlist.build()).attributes(new JsonBall(compact(render(chef_attrs)))).build()
  chef_context.getChefService().updateBootstrapConfigForGroup(group, chef_config)
  val chef_bootstrap = chef_context.getChefService().createBootstrapScriptForGroup(group)
  val bootstrap_node = new StatementList(bootstrap_builder.add(chef_bootstrap).build())




  // ...
}

Here we see some OS flavouring occuring to our base Image abstract class:

// Basic Ubuntu instance that we wish to provision (here we are cloud infrastructure agnostic)
abstract class Ubuntu(version: String) extends Image {
  template_builder
    .osFamily(OsFamily.UBUNTU)
    .osVersionMatches(version)
    .smallest()




  ports += 22 // SSH




  chef_runlist
    .addRecipe("apt")
}

and some vendor related seasoning (specifically, Rackspace seasoning) to our Ubuntu image:

abstract class Ubuntu(version: String) extends image.Ubuntu(version) {
  private[this] lazy val region = config.getString("rackspace.region")
  private[this] lazy val username = config.getString("rackspace.username")
  private[this] lazy val apikey = config.getString("rackspace.apikey")
  private[this] lazy val rackspace_private_key = scala.io.Source.fromFile(config.getString("rackspace.ssh.private")).mkString
  private[this] val rackspace_public_key = scala.io.Source.fromFile(config.getString("rackspace.ssh.public")).mkString




  override lazy val admin = LoginCredentials.builder()
    .user("root")
    .privateKey(rackspace_private_key)
    .authenticateSudo(false)
    .build()




  override lazy val client_context = ContextBuilder.newBuilder(s"rackspace-cloudservers-$region")
    .credentials(username, apikey)
    .modules(Set(new SshjSshClientModule()))
    .buildView(classOf[ComputeServiceContext])




  template_builder
    .options(template_builder.build.getOptions.asInstanceOf[NovaTemplateOptions].authorizePublicKey(rackspace_public_key))
}

Finally, our client node can salt the vendor class to taste:

class RackspaceProvisioner(role: String, seedNode: Address) extends provider.Rackspace.Ubuntu("12.04") {
  template_builder
    .minRam(2048)




  chef_runlist
    .addRecipe("java")
    .addRecipe("cluster")




  chef_attributes += ("cluster" -> ("role" -> role) ~ ("seedNode" -> seedNode.toString))




  ports += 2552
}

However, in order for the above to work, we first need to ensure that:

  • an Enterprise Chef account has been configured
  • and that, via the knife command line utility, our cookbooks have been uploaded to the Chef server.

The following shows how, given that ~/.chef has been configured appropriately, this may be achieved (N.B. I have shifted to using sbt-native-packager in this post):

# Clone and checkout experiment-2
git clone https://github.com/carlpulley/jclouds-cluster
cd jclouds-cluster
git checkout experiment-2
# Assuming Chef has been configured appropriately, build our cluster recipe
sbt stage
sbt debian:packageBin
# Ensure that the built Debian package is added to our cluster recipe!
cp target/cluster-*.deb cookbook/cluster/files/default
# Upload cookbooks to the Chef master
for r in apt java cluster; do
  knife upload cookbook $r
done

A Failed Experiment?

Should you wish to play with this code, it is available on the experiment-2 branch of my Github repository at https://github.com/carlpulley/jclouds-cluster.

As setup here is more complex than usual, here's a short video showing:

  • the deployment of compute nodes in a Amazon cloud
  • JClouds calling the Chef provisioner to configure our compute nodes
  • loop polling and/or scheduling nature of JClouds (a code dip is needed to correctly discriminate here)
  • failure to build an Akka cluster.

However, notice that this experiment fails: the client node successfully joins the custer, but no worker nodes are observed joining the cluster. Moreover, when I ssh into our worker node instances, I can observe that the worker actors have also failed to launch (though I can see that the actor system is correctly listening on port 2552).

A look at the upstart logs (i.e. /var/log/upstart/cluster.log), shows that the worker actor system has attempted to contact its seed node at some private or non-routable IP address.

To better understand the issues here, we need to think about how a typical Akka message (be it a system or user message) is transmitted from the client actor to the (remote) worker actor:

Akka-Transport

So imagine that the green actor wishes to communicate a message to the red actor. Clearly, the message that the actor sends must contain information regarding the source and destination actor. When this message is sent, it will be encapsulated in some form of transport (e.g. TCP/IP) which will also have source and destination addresses defined (here we represent these using the colours grey and white). Now this presents a series of networking related hurdles:

  • what happens if the destination actor binds to an address other than the white address (e.g. it actually binds to a red address)?
  • what happens if the source or destination nodes use NAT (thus decoupling actor addresses and transport addresses)?

In short, what happens is that messages do not get correctly routed by the actor systems. Which in our case means that clusters do not get built and remote actors fail to launch!

One additional complication here is that actors (and specifically actor systems) need to communicate in a bidirectional manner.

So, how can we resolve these issues? One way of ensuring that actors bind to the correct (public) NICs is to specify their addresses when they are provisioned. However, NATing (particularly at the client node end) appears to be a bit of a stopper here!

Conclusions

So, we've seen how we can easily prepare our ping-pong application for scaling into the cloud. We've also seen how we can use JClouds and Chef to deploy and provision worker nodes within an arbitrary vendors cloud (or at least one that JClouds supports!). However, we have also uncovered a number of issues:

  • defining our deployment and provisioning configurations is code heavy
  • if the JClouds API blocks, then our application will also block and so cease to be reactive (Image methods that return futures based on thread-pooling executors can avoid these issues)
  • when NATing is used, clusters and remote Akka actor deployments need some extra TLC in order to function as we might wish or expect.

Before we resolve the networking issues (in a latter post), I will look at using the δ-cloud API in the next post.

Posts by Topic

see all

Subscribe to Email Updates