<img height="1" width="1" src="https://www.facebook.com/tr?id=1076094119157733&amp;ev=PageView &amp;noscript=1">

Notify Slack Channels with Gatling Test Failures

Posted by Deepti Kinnerkar on Thu, Apr 12, 2018

Sending Jenkins Build Status notification to a Slack Channel is pretty straight-forward, But what if the Jenkins build runs a set of System Smoke Tests which make calls to various API services? In case of failure, how do we notify the specific API failure to the Development Team which owns the API?

Overview:

These days I have had the opportunity to write Smoke Tests using Gatling Tool. Why/How use a Perf Tool for Smoke? That would be a blog for another day.

Ours is a Multi-tenant, Micro-services based System where each Micro-service is owned by different Development teams.

In the Smoke Tests , each Simulation represents a User Journey which involves making API calls to different Services.

e.g., A simple Scenario would consist of below steps:

  1. Obtained JWT token by calling the Token Service
  2. Make Media API call using the above JWT token

Two services are involved in the above Simulation i.e., Token and Media Service. Also, these services are owned by two different development Teams.

The Issue:

When the first set of these Smoke Tests were running via Jenkins, all the Teams Leads were notified about the System Smoke Tests and a Common Slack Channel where the Build Failure notifications would be posted. We had these smoke tests very early on in the development cycle, some of the services were not Live and still being developed.

Given the phase of the development it was inevitable that some services would break or that end-to-end integration may not work consistently, this is exactly what happened and there was a period when the Media API call test failed continuously for a week. The Slack channel was overblown with the Jenkins' failure notifications. As the tests ran hourly, there were 24 notifications per day for almost a week for this one failed test. The other Dev teams saw this as spam because the failure was not concerning their API Service. This resulted in many people leaving the Slack channel. Eventually the slack notification had to be disabled to avoid unnecessary spamming. Then on, when a failure occurred, the QA "manually" went through the reports to identify the failed API and report it on the Dev Team's Slack Channel.

Gatling:

The Gatling tool logs the results into the 'simulation.log' file. This file consists of the Simulation Name, API assertion results, description for 'HTTP Builder' and additional data extracted using Gatling's extraInfoExtractor. This log file was useful in identifying the failures, but our tests were logging generic information, which didn't provide enough diagnostic information for the developers to investigate the .

 

What we did next?

Below is the structure of the smoke-test project:

 

 Screen Shot 2018-04-10 at 15.41.08

 

  1. In application.conf file, we created a mapping between API Service and the Slack channels as below:
SLACK_TOKEN_SERVICE = “token-service-dev”
SLACK_MEDIA_SERVICE = “media-service-dev” 

These config values were extracted and assigned to variables as shown below:

object SmokeTestProperties {
  val config = ConfigFactory.load( “application.conf”)
  val commaDemarker = ","
  val slackAccountService =  config.getString("SLACK_TOKEN_SERVICE")
  val slackActivationService = config.getString("SLACK_MEDIA_SERVICE")
 }

2. In the HTTP Builders of Media and Token APIs, We added meaningful API Description, Dev Team's Slack Channel, Resource URLs and extracted one of the response Headers:

def getToken(description: String) = {
    http(description + commaDemarker + slackTokenService +  commaDemarker + "POST:/token")
      .post("/token")
      .headers(Map("Content-Type" -> ApplicationFormUrlEncoded))
      .formParamMap(Map("subject_token_type" -> "urn:ietf:params:oauth:token-type:jwt"))
      .check(status is 200)
      .check(jsonPath("$.access_token").saveAs("accessToken"))
      .extraInfoExtractor(extraInfo => List(extraInfo.response.headers.get("tracking-id")))
    } 
def getMedia(accessToken: String, description: String, mediaId: String) = {
    http(description + commaDemarker + slackMediaService + commaDemarker + s"GET:/media/$mediaId")
      .get(host + "/media/"+mediaId)
      .headers(Map("Authorization" -> accessToken,"Accept" -> "application/vnd.media-service+json; version=1"))
      .check(status is 200)
      .extraInfoExtractor(extraInfo => List(extraInfo.response.headers.get("tracking-id")))
    }  	

3. When we ran the test, the simulation.log file had the below content:

RUN    com.bamtech.smoketest.simulations.FetchProtectedMedia     fetchProtectedMedia  1522936790211     2.0
USER   Fetch Protected Media 1  START  1522936791827  1522936791827
REQUEST    Fetch Protected Media 1     Get Token,token-service-dev,POST:/token   1522936791857  1522936792517  OK   kkMOnhtGqD7C 
REQUEST    Fetch Protected Media 1     Get Media,media-services-dev,GET:/media/345-abc-342  1522936792884  1522936793221  KO status.find.is(200), but actually found 500    aOSlGhGMBhJm
USER   Fetch Protected Media 1  END    1522936791827  1522936793241

4. The details of the failed API calls were extracted using the below shell script:

#!/usr/bin/env bash
for f in $(find target/gatling -name 'simulation.log');
do
   className=$(awk -F "\t" '{if ($1 =="RUN") print $2}' $f)
   awk -F "\t" -v classtoprint="$className" '{if ($8 =="KO") print classtoprint",",$5,$9,","$10}' $f >> jenkins_notifications.txt
done

Output was saved in Jenkins_notifications.txt file which consists of  Simulation Classname, API Description, Slack Channel, API Resource URL, Expected Result, Actual Result and the Extracted Response Header :

com.bamtech.smoketest.simulations.FetchProtectedMedia, Get Media,media-services-dev,GET:/media/345-abc-342 status.find.is(200), but actually found 500 ,aOSlGhGMBhJm

5. The JenkinsFile was enhanced to send notification: 

node('nodeName') {
    checkout scm
    ansiColor('xterm') {
        try {
            setup()
            stage('Execute Smoke Tests') {
                try {
                    /*Run the tests using SBT*/
                } finally {
                    /* Collect and Publish Reports*/
                }
            }
        } catch (error) {
            sh 'rm -rf jenkins_notifications.txt'
            sh './process_simulation_log.sh'
            if (env.BRANCH_NAME == 'master') {
                readFile("jenkins_notifications.txt").eachLine{ line, count ->
                    def fields=line.split(',')

                    def headline = "Smoke Test Failed!"
                    def channel = fields[2]
                    def red = "#FF0000"
                    def errorMessage = fields[3] + fields[4]
                    def trackingId = fields[5]
                    def action = "Report Cause of Failure in #svcs-common-demo"
                    def jobInfo = "<${env.BUILD_URL}|${env.JOB_NAME} [${env.BUILD_NUMBER}]>"
                    def simulation = fields[0]
                    def testDescription = fields[1]

                    slackSend(channel: channel.toString(), color: red.toString(), message: "$headline \nError Message: $errorMessage \ntracking-Id: $trackingId \nAction: $action \nJob info: $jobInfo \nFailed Simulation: $simulation \nTest Description: $testDescription")
                }
            }
            throw error
        }
    }
}

Result:

The Jenkins' Failure notifications were posted on the respective Team's Slack channel. The Dev reaction to the failure was faster than before. The cause of failure and fix were communicated to the QA almost immediately.

e.g., The Notification on Media Service's Slack Channel:

Screen Shot 2018-04-10 at 15.26.41

This is how we are running some lightweight smoke tests to ensure some of the "happy path" scenarios work in the platform, any failures are reported to the appropriate teams and continue to offer value to development teams.

 

 

 

 

 

Topics: Jenkins, Testing, Gatling

Recent Posts

Posts by Topic

see all

Subscribe to Email Updates