Intermittent "Error 502 Service Temporarily Unavailable" errors following deployment


Customers may experience a small percentage of intermittent "502 Service Temporarily Unavailable" errors after making configuration changes (e.g. changing a configuration variable) and re-deploying an otherwise healthy application that was running fine before.


Root cause

By design, there is a race condition in the way Kubernetes de-provisions pods.

In a nutshell, when you terminate a Pod, removing the endpoint and the signal to the kubelet are issued at the same time. See Graceful shutdown and zero downtime deployments in Kubernetes for more information.


Solution Steps

One of the ways to address this in a raw Kubernetes implementation is to use a PreStop hook.

However, given the abstraction layer introduced by the platform, this is not possible in EYK yet the same level of control can be achieved by using traps. In a nutshell, a trap can step in and take a series of actions (commands) that can give EYK complete control of the lifecycle of the process.

To achieve that we need to:

  1. Use dumb-init to make sure that it always takes PID 1 and passes the signals to its children
  2. Create an that uses trap to take a series of actions before it runs
  3. Create an that uses dumb-init to map 15:0 i.e. TERM to EXIT (another way of ignoring the signal altogether) and includes the application run command

The above solution requires the following:

  1. dumb-init package
  2. trap (already available)
  3. bash script
  4. bash script

Below are the specific details of the solution steps:

  1. Dockerfile

    Include dumb-init package

    RUN apt-get update && apt-get install -y dumb-init

    Although on our implementation we rely on a Procfile to pass the process instructions, you may also add it on the Dockerfile:

    ENTRYPOINT ["/usr/bin/dumb-init","--"]

    From the above, you can see that we are not changing the mapping here just yet.

  2. Procfile

    Here we will include both the use of dumb-init as well as the script:

    web: /usr/bin/dumb-init -- ./script/


    Here we are adding the trap followed by the script:

    #puma example trap "echo SIGTERM recieved - sleeping 30 seconds; sleep 30; echo Slept 30 Seconds - stopping Puma; pkill -TERM -f '^([^ ]*/)?puma '; exit 0" TERM
    #passenger example
    #trap "passenger-status; echo SIGTERM recieved - sleeping 30 seconds; sleep 30; echo Slept 30 Seconds - stopping Passenger; passenger stop --port 5000; exit 0" TERM


    As per the trap options above, once the pod receives the signal TERM (i.e. SIGTERM default for docker/kubernetes scale down/rolling update process) it will issue the following serially:

    • echo SIGTERM received - sleeping 30 seconds

    An informational message that we have received SIGTERM

    • sleep 30

    This is holding the next command for 30 seconds

    • echo Slept 30 Seconds - stopping Puma

    An informational message that we are about to stop puma

    • pkill -TERM -f '^([^ ]*/)?puma '

    This is actually the command that stops puma gracefully

    • exit 0

    That's where we are exiting trap.

  3. script/

    Here we are using dumb-init to ignore SIGTERM (15:0) and start puma the usual way:

    /usr/bin/dumb-init --rewrite 15:0 -- bundle exec puma -p 3000

    #Alternatively if you are using passenger
    #/usr/bin/dumb-init --rewrite 15:0 -- bundle exec passenger start --port 5000 --max-pool-size 2 --min-instances 2
    Finally, we are good to git add/commit/push.

If using eyk pull to deploy the image, ensure you have accordingly updated the YAML string used to supply a Procfile to the application to the above Procfile. In addition, given that we will be introducing the 30-second delay on the pod's termination process, it is advisable to also introduce a greater termination grace period by running this command:


Related Article


Article is closed for comments.