3 minutes
Wait for Complete Deployment on GitLab CI with FluxCD and Kubernetes
When running end-to-end (E2E) tests in GitLab CI against a Go application deployed on Kubernetes with FluxCD, I ran into an interesting issue: the tests were starting so quickly that they were actually running against the previous version of the application, not the one just deployed.
Because FluxCD uses a pull-based deployment model, it’s not straightforward to determine when the deployment has actually completed. You can’t just trigger a deployment and assume it’s done after a fixed delay.
Initial Attempt: kubectl wait
My first attempt to solve this involved using:
kubectl wait --for=condition=available deployment/test-application --timeout=180s
However, this didn’t solve the issue. Even though the deployment became available, the old pods were still terminating due to the rolling update strategy:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
Working Solution: Verify Pod Images
The solution that actually worked was to poll the running pods and verify their images. Specifically, the script ensures that:
- FluxCD is explicitly asked to reconcile.
- The deployment becomes available.
- All running pods use the expected image corresponding to the current
$COMMIT_SHA
.
Here’s the script used in the GitLab job:
script:
# ...
- |
export IMAGE_TAG="000000000000.dkr.ecr.eu-west-1.amazonaws.com/test-application:$COMMIT_SHA"
export MAX_SECONDS=180
export POLL_SECONDS=10
# Trigger FluxCD reconciliation manually
kubectl annotate gitrepository test-application reconcile.fluxcd.io/requestedAt=$(date +%s) --overwrite -n flux-system
if [ $? -ne 0 ]; then
echo "Error: Failed to annotate gitrepository."
exit 1
fi
echo "FluxCD reconciliation triggered."
# Wait for deployment to become 'available'
kubectl wait --for=condition=available deployment/test-application --timeout=${MAX_SECONDS}s
if [ $? -ne 0 ]; then
echo "Error: Deployment did not become available within ${MAX_SECONDS}s."
kubectl get pods -l app=test-application -o custom-columns="NAME:.metadata.name,STATUS:.status.phase,IMAGE:.spec.containers[0].image"
exit 1
fi
echo "Deployment is available. Now verifying image version and old pod termination."
start_time=$(date +%s)
while true; do
current_time=$(date +%s)
elapsed_time=$((current_time - start_time))
if [ "$elapsed_time" -ge "$MAX_SECONDS" ]; then
echo "Timeout (${MAX_SECONDS}s) reached."
exit 1
fi
echo "Checking pod images (${elapsed_time}s/${MAX_SECONDS}s)"
EXPECTED_REPLICAS=$(kubectl get deployment "test-application" -n "$APP_NS" -o jsonpath='{.spec.replicas}')
if [ -z "$EXPECTED_REPLICAS" ]; then
echo "Could not determine desired replicas."
sleep "$POLL_SECONDS"
continue
elif [ "$EXPECTED_REPLICAS" -eq 0 ]; then
echo "Desired replicas is 0."
break
fi
IMAGES=$(kubectl get pods -l app=test-application \
-o jsonpath="{range .items[?(@.status.phase=='Running')]}{.spec.containers[0].image}{'\n'}{end}")
TOTAL_PODS=$(echo "$IMAGES" | wc -l)
READY_PODS=$(echo "$IMAGES" | awk -v tag="$IMAGE_TAG" 'BEGIN {count=0} { if ($0 == tag) count++ } END {print count}')
if [ "$READY_PODS" -eq 0 ]; then
echo "Could not find any pod with the expected image."
sleep "$POLL_SECONDS"
continue
fi
if [ "$READY_PODS" -ne "$EXPECTED_REPLICAS" ]; then
echo "Unexpected number of pods with the expected image."
sleep "$POLL_SECONDS"
continue
fi
if [ "$READY_PODS" -eq "$TOTAL_PODS" ]; then
echo "Deployment succeeded."
break
fi
sleep "$POLL_SECONDS"
done
Summary
If you’re using FluxCD with GitLab CI and running E2E tests, it’s crucial to make sure your test environment reflects the latest deployment. Simply waiting for the deployment to be available isn’t always enough—especially with rolling updates.
By combining FluxCD reconciliation with explicit image verification, you can ensure that your tests only run after the new version is fully deployed.