CD (Continuous Delivery) — Part 6: Final validations, Manual Approval, and Deploy to Production


📚 Series: CI/CD and AI: From Theory to Practice

This is the final part of the CD block. It covers the consolidation of all pipeline results, informed human approval, and progressive deployment to production with minimum-risk strategies.

Parts of this block:


Final validations (E2E + Performance + Compliance + Security + Observability) and aggregated reports

This stage does not run new tests, but rather aggregates, correlates, and evaluates the results of all previous phases to decide whether the release is “promotable” (whether a version has sufficient quality to advance to the next level).

The goal is to make an automatic and objective decision about whether the release is fit for production, based on:

  • E2E results in staging.
  • Performance smoke metrics.
  • Load/stress/chaos results (if they are blocking for this release).
  • Security status (CVE, dependencies, container, IaC).
  • Compliance (corporate policies, auditing, SBOM, signature).
  • Observability (errors, saturation, anomalous logs, traces).

This stage generates a unified report that feeds the Manual Approval step.

This stage includes

1. Result consolidation. Collects artifacts from:

2. Automatic gate evaluation. Each gate has clear rules:

Gate Criterion Action
Security 0 CRITICAL CVEs, HIGH reviewed Blocks
E2E 100% critical flows OK Blocks
Performance p95 < SLA, error rate < 1% Blocks
Load/Stress No severe degradation Warning or blocks depending on release
Chaos Automatic recovery Warning or blocks depending on policy
Compliance Valid signature, SBOM present Blocks
Observability No critical anomalies Warning or blocks

3. Generation of a unified report. The pipeline produces an executive summary with:

  • Status of each gate
  • Key metrics
  • Detected risks
  • Recommendations
  • Links to artifacts

4. Automatic decision:

  • If all critical gates are OK → advances to Manual Approval
  • If there are critical failures → rollback + ticket
  • If there are warnings → mandatory Manual Approval with risk note

Best practices

  • Define clear and versioned gates: criteria must be in code (policy-as-code).
  • Do not mix functional validations with resilience validations: each has its own weight and policy.
  • Maintain full traceability: each release must have commit SHA, image tag, signature, SBOM, E2E reports, k6 reports, chaos logs.
  • Automate 90%: manual approval must be informed, not a blind “yes/no”.
  • Integrate with observability: correlate p95, CPU/memory saturation, 5xx errors, DB latency, MQ queues, GC (Garbage Collector).

Examples

Unified Validation Report (markdown generated by the pipeline):

## **Unified Validation Report – Release `<VERSION>`**
**Date:** `<DATE>`
**Commit SHA:** `<SHA>`
**Image:** `<REGISTRY>/<APP>:<TAG>`
**Environment:** Staging
**Generated by:** CD Pipeline
---
## 1. Artifact summary
- **SBOM (CycloneDX):** ✔️ Generated
- **Image signature (cosign):** ✔️ Verified
- **Image Scan (Trivy/Grype/Snyk):**
   - CRITICAL: `<N>`
   - HIGH: `<N>`
   - MEDIUM: `<N>`
   - LOW: `<N>`
- **E2E Tests:** `<PASSED>/<TOTAL>`
- **Performance Smoke:**
   - p95: `<X ms>`
   - Error rate: `<Y %>`
- **Load Test:**
   - Throughput: `<RPS>`
   - p95: `<ms>`
   - Error rate: `<%>`
- **Stress Test:**
   - Breaking point: `<VU>`
   - Recovery: `<seconds>`
- **Chaos Engineering:**
   - Experiments executed: `<N>`
   - Automatic recovery: ✔️ / ❌
- **Compliance:**
   - Signature policies: ✔️
   - SBOM policies: ✔️
   - Security policies: ✔️ / ❌
---
## 2. Security validation
### 2.1 Image scan
- CRITICAL: `<N>` — Result: **OK / FAIL**
### 2.2 Dependencies (SCA)
- Blocking vulnerabilities: `<N>` — Result: **OK / FAIL**
### 2.3 Compliance
- Signature verified: ✔️ | SBOM present: ✔️ | Approved licenses: ✔️ / ❌
---
## 3. Functional validation (E2E)
- Total tests: `<N>` | Passed: `<N>` | Failed: `<N>`
- Critical flows: ✔️ / ❌ — Result: **OK / FAIL**
---
## 4. Performance validation
### 4.1 Performance Smoke
- p95: `<X ms>` | p99: `<Y ms>` | Error rate: `<Z %>` — **OK / WARNING / FAIL**
### 4.2 Load Test
- Throughput: `<RPS>` | p95: `<ms>` | Error rate: `<%>` — **OK / WARNING / FAIL**
### 4.3 Stress Test
- Breaking point: `<VU>` | Recovery: `<s>` — **OK / WARNING / FAIL**
---
## 5. Resilience validation (Chaos Engineering)
- Experiments: `<N>` | Automatic recovery: ✔️ / ❌ — **OK / WARNING / FAIL**
---
## 6. Observability
- 5xx errors: `<N>` | CPU saturation: `<%>` | Memory saturation: `<%>`
- DB latency: `<ms>` | Anomalous logs: `<yes/no>` — **OK / WARNING / FAIL**
---
## 7. Pipeline automatic conclusion
**Final status:**
- ✔️ Fit for Manual Approval
- ⚠️ Fit with warnings
- ❌ Not fit (remediation required)
---
## 8. Links to artifacts
- SBOM | E2E Reports | k6 Reports | Chaos Logs | Application Logs | Prometheus/Grafana Dashboard | Release Notes

Example gate policy (Policy-as-Code YAML):

policies/gates.yaml

version: 1.0
gates:
  security:
    critical: 0
    high: 5
    fail_on_critical: true
    fail_on_high: false
  e2e:
    required_pass_rate: 1.0
    allow_flaky: false
    critical_flows_must_pass: true
  performance_smoke:
    p95_ms: 500
    error_rate: 0.01
    fail_on_threshold: true
  load_test:
    p95_ms: 800
    error_rate: 0.02
    fail_on_threshold: false
  stress_test:
    allow_breaking_point: true
    max_recovery_seconds: 30
    fail_on_recovery_violation: true
  chaos:
    require_recovery: true
    max_recovery_seconds: 60
    fail_on_recovery_violation: false
  compliance:
    require_sbom: true
    require_signature: true
    allowed_licenses:
      - MIT
      - Apache-2.0
      - BSD-3-Clause
  observability:
    max_5xx: 5
    max_cpu_pct: 85
    max_mem_pct: 90
    fail_on_anomaly: false
decision:
  promote_if:
    - security
    - e2e
    - performance_smoke
    - compliance
  warn_if:
    - load_test
    - stress_test
    - chaos
    - observability

AI

AI contributes:

  • Automatic correlation between metrics, logs, and traces.
  • Anomaly detection (spikes, patterns, regressions).
  • Executive summary for the approver.
  • Risk prioritization.
  • Automatic ticket generation with suggested root cause.
  • Mitigation recommendations (timeouts, pool sizes, caching, retries).

AI DOES NOT:

  • Replace the final human decision.
  • Approve or reject a release on its own.
  • Run tests or manipulate environments.

Manual Approval: critical, human, and informed decision

This stage converts all the technical evidence generated by the pipeline into an informed human decision, with traceability, governance, and risk control.

It is the last step of CD before production. It is based on the automatically generated executive summary that consolidates security, performance, resilience, compliance, and risks. The approval is recorded for auditing and governance. If rejected, a remediation flow is activated.

The goal of Manual Approval is to allow a responsible person (PO, Tech Lead, SRE, Security, Compliance, etc.) to make a final decision based on:

  • Consolidated technical evidence
  • Known risks
  • Security, performance, resilience, and compliance status
  • Business context (deployment windows, impact, dependencies)

It is the last control before production and must not be a formality, but a well-founded decision.

This stage includes

  • Executive Summary presentation: the pipeline shows the approver a clear and actionable summary with the status of all gates, detected risks, technical recommendation, links to artifacts, runbook, and rollback.

  • Human review: the approver reviews security results, critical E2E, performance, load/stress, chaos, logs/metrics, compliance, and business risks.

  • Decision: the approver can:

    • Approve → the Deploy to Production job is executed
    • Reject → rollback, ticket creation, and pipeline re-execution after remediation
    • Request changes → an issue is generated and the release is blocked
  • Auditing: the decision is recorded with who approved, when, which version, which risks were accepted, and which evidence was reviewed (essential for compliance and traceability).

Best practices

  • Require multiple approvers depending on criticality:
    • Minor releases → 1 approval (Tech Lead)
    • Major releases → 2 approvals (Tech Lead + Security)
    • Critical releases → 3 approvals (Tech Lead + Security + PO)
  • Show only relevant information: no noise. Only key metrics and links.
  • Include a mandatory checklist: prevents “blind” approvals.
  • Set a timeout: if not approved within X hours → the release expires.
  • Integrate with runbooks: the approver must have immediate access to rollback, mitigations, and on-call contacts.
  • Keep the decision in the release system for auditing and compliance.

Advantages

  • Risk control: allows reviewing technical and business evidence before executing an irreversible change in production.
  • Alignment between technology and business: the approver can evaluate whether the timing is appropriate (maintenance windows, campaigns, traffic spikes).
  • Auditing and regulatory compliance: essential in regulated sectors (finance, insurance, health, public administration).
  • Contextual validation: the pipeline does not know business priorities, reputational risks, or external dependencies. The approver does.

Disadvantages

  • Can introduce friction or delays: unavailable approvers or accumulated releases.
  • Risk of “blind” approvals: if the executive summary is not clear or the evidence is not reviewed.
  • Does not scale well without prior automation: if automatic gates do not filter correctly, the approver receives too much information.
  • Single point of failure: if only one person approves, it becomes a bottleneck.

AI

AI can:

  • Generate the executive summary synthesizing security results, metrics, logs, traces, chaos, and risks.
  • Correlation of weak signals: detect progressive degradations, correlation between latency spikes and GC, log anomalies.
  • Risk prioritization according to potential impact (availability, security, performance, compliance).
  • Generation of recommendations (mitigations, configuration adjustments, preventive actions).
  • Preparation of audit evidence (links, metrics, artifacts, relevant logs).

What it CANNOT do:

  • Cannot approve or reject a release.
  • Cannot make risk decisions.
  • Cannot replace human judgment.
  • Cannot execute changes in production on its own.
  • Cannot interpret business context (campaigns, dependencies, reputational impact).

Deploy to Production: Blue/Green, Canary, Progressive Delivery, and extreme risk control

The goal is to promote the same signed artifact that has passed all previous validations and deploy it to the production environment minimizing risk, maximizing observability, and guaranteeing immediate reversibility.

It is the final stage of CD and the point where governance, engineering, business, reliability, and user experience converge.

This stage includes

  • Final artifact selection: signed image (cosign), associated SBOM, release metadata (commit, tag, date, approvers).

  • Deployment strategy (see detail below): Blue/Green, Canary, Rolling Update, Feature Flags.

  • Reinforced observability: during deployment, p50/p95/p99, error rate, CPU/memory saturation, DB latency, anomalous logs, distributed traces, and impact on real users are monitored.

  • Production gates (example):

    • Error rate > X% → rollback
    • p95 > Y ms → rollback
    • 5xx > Z → rollback
    • CPU saturation > 90% → rollback
  • Automatic rollback: must be immediate, safe, reproducible, and without data loss.

Deployment strategies

Blue/Green:

  • Two identical environments (Blue and Green).
  • Deployed to the inactive one.
  • Instantaneous traffic switch.
  • Immediate rollback by switching back to the previous environment.

Canary:

  • Deployed to a small percentage of users (1–5%).
  • Latency, errors, and saturation are monitored.
  • Progressively increased to 100%.
  • Automatic rollback if anomalies are detected.

Rolling Update:

  • Gradual pod replacement.
  • Less control than Canary, but simpler.

Feature Flags:

  • Progressive feature activation.
  • Allows logical rollback without redeployment.

Best practices

  • Always promote the same signed artifact: never rebuild images for production.
  • Use progressive strategies (Canary > Rolling): reduces risk and allows fast rollback.
  • Activate advanced observability: specific dashboards for the deployment.
  • Automate rollback: must not depend on human intervention.
  • Maintain parity with staging: resources, sidecars, limits, configuration.
  • Integrate feature flags: allows enabling/disabling features without redeployment.
  • Run smoke tests post-deploy: quick validation in production after the switch.
  • Record the release: approvers, metrics, artifacts, accepted risks, date and time, strategy used.

Advantages

  • Extremely controlled risk: with Blue/Green or Canary, the impact of a failure is minimal.
  • Immediate reversibility: rollback in seconds without affecting users.
  • Real observability on real traffic: allows detecting issues that do not appear in staging.
  • Integration with feature flags: allows activating features safely.
  • Full traceability: each deployment is recorded with technical and business evidence.

Disadvantages

  • Operational cost: Blue/Green requires duplicating infrastructure; Canary requires more routing complexity.
  • Dependency on mature observability: without reliable metrics, Canary is dangerous.
  • Rollback not always trivial: if there are irreversible data migrations, rollback becomes complicated.
  • Requires strict discipline: no manual changes in production. Everything must be declarative and immutable.

AI

  • Real-time analysis during canary: AI can detect latency anomalies, error spikes, saturation, degradation patterns, and correlations between logs and metrics.
  • Rollback or promotion recommendations based on trends, comparisons with previous releases, and historical patterns.
  • Dashboard and summary generation: temporary panels, impact summaries, Blue vs Green comparisons.
  • Detection of invisible regressions: slow degradation, memory leaks, latency in non-critical endpoints.

What AI CANNOT do:

  • Cannot execute the deployment on its own.
  • Cannot make decisions about business risks.
  • Cannot manipulate real traffic without supervision.
  • Cannot replace human governance.

Share this post on:
Safe Creative #1401310112503
CD (Continuous Delivery) — Part 6: Final validations, Manual Approval, and Deploy to Production por "www.jarroba.com" esta bajo una licencia Creative Commons
Reconocimiento-NoComercial-CompartirIgual 3.0 Unported License.
Creado a partir de la obra en www.jarroba.com

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Uso de cookies

Este sitio web utiliza cookies para que usted tenga la mejor experiencia de usuario. Si continúa navegando está dando su consentimiento para la aceptación de las mencionadas cookies y la aceptación de nuestra política de cookies

ACEPTAR
Aviso de cookies