Wait for Complete Deployment on GitLab CI with FluxCD and Kubernetes

When running end-to-end (E2E) tests in GitLab CI against a Go application deployed on Kubernetes with FluxCD, I ran into an interesting issue: the tests were starting so quickly that they were actually running against the previous version of the application, not the one just deployed.

Because FluxCD uses a pull-based deployment model, it’s not straightforward to determine when the deployment has actually completed. You can’t just trigger a deployment and assume it’s done after a fixed delay.

Initial Attempt: `kubectl wait`

My first attempt to solve this involved using:

kubectl wait --for=condition=available deployment/test-application --timeout=180s

However, this didn’t solve the issue. Even though the deployment became available, the old pods were still terminating due to the rolling update strategy:

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 0

Working Solution: Verify Pod Images

The solution that actually worked was to poll the running pods and verify their images. Specifically, the script ensures that:

FluxCD is explicitly asked to reconcile.
The deployment becomes available.
All running pods use the expected image corresponding to the current $COMMIT_SHA.

Here’s the script used in the GitLab job:

script:
  # ...
  - |
    export IMAGE_TAG="000000000000.dkr.ecr.eu-west-1.amazonaws.com/test-application:$COMMIT_SHA"
    export MAX_SECONDS=180
    export POLL_SECONDS=10

    # Trigger FluxCD reconciliation manually
    kubectl annotate gitrepository test-application reconcile.fluxcd.io/requestedAt=$(date +%s) --overwrite -n flux-system
    if [ $? -ne 0 ]; then
      echo "Error: Failed to annotate gitrepository."
      exit 1
    fi
    echo "FluxCD reconciliation triggered."

    # Wait for deployment to become 'available'
    kubectl wait --for=condition=available deployment/test-application --timeout=${MAX_SECONDS}s
    if [ $? -ne 0 ]; then
      echo "Error: Deployment did not become available within ${MAX_SECONDS}s."
      kubectl get pods -l app=test-application -o custom-columns="NAME:.metadata.name,STATUS:.status.phase,IMAGE:.spec.containers[0].image"
      exit 1
    fi
    echo "Deployment is available. Now verifying image version and old pod termination."

    start_time=$(date +%s)
    while true; do
      current_time=$(date +%s)
      elapsed_time=$((current_time - start_time))

      if [ "$elapsed_time" -ge "$MAX_SECONDS" ]; then
        echo "Timeout (${MAX_SECONDS}s) reached."
        exit 1
      fi

      echo "Checking pod images (${elapsed_time}s/${MAX_SECONDS}s)"

      EXPECTED_REPLICAS=$(kubectl get deployment "test-application" -n "$APP_NS" -o jsonpath='{.spec.replicas}')
      if [ -z "$EXPECTED_REPLICAS" ]; then
        echo "Could not determine desired replicas."
        sleep "$POLL_SECONDS"
        continue
      elif [ "$EXPECTED_REPLICAS" -eq 0 ]; then
        echo "Desired replicas is 0."
        break
      fi

      IMAGES=$(kubectl get pods -l app=test-application \
        -o jsonpath="{range .items[?(@.status.phase=='Running')]}{.spec.containers[0].image}{'\n'}{end}")

      TOTAL_PODS=$(echo "$IMAGES" | wc -l)
      READY_PODS=$(echo "$IMAGES" | awk -v tag="$IMAGE_TAG" 'BEGIN {count=0} { if ($0 == tag) count++ } END {print count}')

      if [ "$READY_PODS" -eq 0 ]; then
        echo "Could not find any pod with the expected image."
        sleep "$POLL_SECONDS"
        continue
      fi

      if [ "$READY_PODS" -ne "$EXPECTED_REPLICAS" ]; then
        echo "Unexpected number of pods with the expected image."
        sleep "$POLL_SECONDS"
        continue
      fi

      if [ "$READY_PODS" -eq "$TOTAL_PODS" ]; then
        echo "Deployment succeeded."
        break
      fi

      sleep "$POLL_SECONDS"
    done

Summary

If you’re using FluxCD with GitLab CI and running E2E tests, it’s crucial to make sure your test environment reflects the latest deployment. Simply waiting for the deployment to be available isn’t always enough—especially with rolling updates.

By combining FluxCD reconciliation with explicit image verification, you can ensure that your tests only run after the new version is fully deployed.

Wait for Complete Deployment on GitLab CI with FluxCD and Kubernetes

Initial Attempt: kubectl wait

Working Solution: Verify Pod Images

Summary

Initial Attempt: `kubectl wait`