One of our projects requires the publishing of messages to a RabbitMQ message broker. The first time a message needs to be sent, an Akka actor will be launched, which will hold a connection to RabbitMQ. Each time another messages arrives, it will go through the same actor, reusing the connection.
Due to occasional issues with our load-balancer, the connection to RabbitMQ drops, causing an
EOFException to be thrown.
According to default supervision strategy, the actor will restart, creating a fresh connection to RabbitMQ.
But what if the connection issue is persistent? Every time the actor restarts, the connection drops again?
No worries, Akka got us covered, we can set
OneForOneStrategy.maxNrOfRetries parameter on the parent's supervision strategy,
that would result in the stopping of the actor if a limit was exceeded.
However, this didn't help much in our case! Because whenever we get another message to be sent to RabbitMQ, the actor will be re-created again, going into the same restart/stop loop. So we wanted to take a more aggressive action, instead of stopping the actor, we wanted to escalate to the parent, delegating the decision up in the supervision hirarchy.
Akka's OneForOneStrategy has a
maxNrOfRetries parameter, that would result in a
Stop rather than a
Restart if its limit was exceeded.
How can we
I had two options,
The first solution duplicates the
ChildRestartStats already collected by the supervision strategy, so I decided to go with the second.
Note: OneForOneStrategy is a case class, so I'd get shamed by the compiler if I extended it.
Seems that processFailure gets called whenever the supervision decider returns
Stop, so it could decide whether to stop or restart, taking the
ChildRestartStats into account.
If I overrode this method, how would I escalate the exception? There must be a method in the parent class to call.
By checking SupervisorStrategy, I discovered I was wrong, it seems that the escalation logic happens outside the strategy, when handleFailure returns false.
It became obvious that we need to override handleFailure to return false in case the limit was exceeded.
By moving the
requestRestartPermission check from
handleFailure,we can exit early returning false. This results in the actor escalating it.
I wish if
decider had access to
stats, I'd simply write:
Do you have a better solution? Please share with us in the comments below.
Manchester SK5 6DA
0845 617 1200
22 Upper Ground
Cake Solutions INC
33 Irving Place, 3rd Floor
New York, NY 10003