Overview
Customers may experience a small percentage of intermittent "502 Service Temporarily Unavailable" errors after making configuration changes (e.g. changing a configuration variable) and re-deploying an otherwise healthy application that was running fine before.
Solution
Root cause
By design, there is a race condition in the way Kubernetes de-provisions pods.
In a nutshell, when you terminate a Pod, removing the endpoint and the signal to the kubelet are issued at the same time. See Graceful shutdown and zero downtime deployments in Kubernetes for more information.
Solution Steps
One of the ways to address this in a raw Kubernetes implementation is to use a PreStop hook.
However, given the abstraction layer introduced by the platform, this is not possible in EYK yet the same level of control can be achieved by using traps. In a nutshell, a trap can step in and take a series of actions (commands) that can give EYK complete control of the lifecycle of the process.
To achieve that we need to:
- Use dumb-init to make sure that it always takes PID 1 and passes the signals to its children
- Create an entrypoint.sh that uses trap to take a series of actions before it runs appcontrol.sh
- Create an appcontrol.sh that uses dumb-init to map 15:0 i.e. TERM to EXIT (another way of ignoring the signal altogether) and includes the application run command
The above solution requires the following:
Below are the specific details of the solution steps:
-
Dockerfile
Include dumb-init package
RUN apt-get update && apt-get install -y dumb-init
Although on our implementation we rely on a Procfile to pass the process instructions, you may also add it on the Dockerfile:
ENTRYPOINT ["/usr/bin/dumb-init","--"]
From the above, you can see that we are not changing the mapping here just yet.
-
Procfile
Here we will include both the use of dumb-init as well as the entrypoint.sh script:
web: /usr/bin/dumb-init -- ./script/entrypoint.sh
script/entrypoint.sh
Here we are adding the trap followed by the appcontrol.sh script:
#!/bin/bash
#puma example trap "echo SIGTERM recieved - sleeping 30 seconds; sleep 30; echo Slept 30 Seconds - stopping Puma; pkill -TERM -f '^([^ ]*/)?puma '; exit 0" TERM
#passenger example
#trap "passenger-status; echo SIGTERM recieved - sleeping 30 seconds; sleep 30; echo Slept 30 Seconds - stopping Passenger; passenger stop --port 5000; exit 0" TERM
./script/appcontrol.shAs per the trap options above, once the pod receives the signal TERM (i.e. SIGTERM default for docker/kubernetes scale down/rolling update process) it will issue the following serially:
-
echo SIGTERM received - sleeping 30 seconds
An informational message that we have received SIGTERM
-
sleep 30
This is holding the next command for 30 seconds
-
echo Slept 30 Seconds - stopping Puma
An informational message that we are about to stop puma
-
pkill -TERM -f '^([^ ]*/)?puma '
This is actually the command that stops puma gracefully
-
exit 0
That's where we are exiting trap.
-
-
script/appcontrol.sh
Here we are using dumb-init to ignore SIGTERM (15:0) and start puma the usual way:
/usr/bin/dumb-init --rewrite 15:0 -- bundle exec puma -p 3000
Finally, we are good to git add/commit/push.
#Alternatively if you are using passenger
#/usr/bin/dumb-init --rewrite 15:0 -- bundle exec passenger start --port 5000 --max-pool-size 2 --min-instances 2
If using eyk pull
to deploy the image, ensure you have accordingly updated the YAML string used to supply a Procfile to the application to the above Procfile. In addition, given that we will be introducing the 30-second delay on the pod's termination process, it is advisable to also introduce a greater termination grace period by running this command:
eyk config:set KUBERNETES_POD_TERMINATION_GRACE_PERIOD_SECONDS=60 -a appname
Comments
Article is closed for comments.