<img height="1" width="1" src="https://www.facebook.com/tr?id=1076094119157733&amp;ev=PageView &amp;noscript=1">

Inspecting Akka's Supervision Strategy

Posted by Tamer M AbdulRadi on Tue, Feb 23, 2016

 

One of our projects requires the publishing of messages to a RabbitMQ message broker. The first time a message needs to be sent, an Akka actor will be launched, which will hold a connection to RabbitMQ. Each time another messages arrives, it will go through the same actor, reusing the connection.

Due to occasional issues with our load-balancer, the connection to RabbitMQ drops, causing an EOFException to be thrown. According to default supervision strategy, the actor will restart, creating a fresh connection to RabbitMQ. But what if the connection issue is persistent? Every time the actor restarts, the connection drops again? No worries, Akka got us covered, we can set OneForOneStrategy.maxNrOfRetries parameter on the parent's supervision strategy, that would result in the stopping of the actor if a limit was exceeded.

However, this didn't help much in our case! Because whenever we get another message to be sent to RabbitMQ, the actor will be re-created again, going into the same restart/stop loop. So we wanted to take a more aggressive action, instead of stopping the actor, we wanted to escalate to the parent, delegating the decision up in the supervision hirarchy.

amqp-supervision-hierarchy

Problem

Akka's OneForOneStrategy has a maxNrOfRetries parameter, that would result in a Stop rather than a Restart if its limit was exceeded.
How can we Escalate instead?

Solution

I had two options,

  1. Keep track of the retry count in the parent
  2. Copy OneForOneStrategy and hack things!

The first solution duplicates the ChildRestartStats already collected by the supervision strategy, so I decided to go with the second.

Note: OneForOneStrategy is a case class, so I'd get shamed by the compiler if I extended it.

Inspecting OneForOneStrategy

Seems that processFailure gets called whenever the supervision decider returns Restart or Stop, so it could decide whether to stop or restart, taking the ChildRestartStats into account.

If I overrode this method, how would I escalate the exception? There must be a method in the parent class to call.

Inspecting SupervisorStrategy

By checking SupervisorStrategy, I discovered I was wrong, it seems that the escalation logic happens outside the strategy, when handleFailure returns false.

It became obvious that we need to override handleFailure to return false in case the limit was exceeded.

EscalatingOneForOneStrategy

By moving the requestRestartPermission check from processFailure to handleFailure,we can exit early returning false. This results in the actor escalating it.

Better Solution

I wish if decider had access to stats, I'd simply write:

Do you have a better solution? Please share with us in the comments below.

Topics: Akka, RabbitMQ

Posts by Topic

see all

Subscribe to Email Updates