What AI Can and Cannot Do Today in Software Development

Without Marketing Hype or Catastrophism: The Real State of AI Across the Software Lifecycle

A standalone article from the series “AI in Business: Expectations vs. Reality”

There are two ways of talking about AI in software development that are both worth avoiding: the unqualified enthusiasm (“AI does everything better and faster”) and the paralyzing skepticism (“it’s all smoke — in two years nobody will remember it”).

Today’s reality is more interesting than either position, and also more practical: AI is extraordinarily good at certain specific things and quite poor at others. The problem is that businesses and media tend to talk about AI as if it were a single, homogeneous thing — when in reality there’s an enormous difference between what AI does well and what people ask it to do.

This article aims to be that map of real capabilities.

The Five Levels of AI Integration in Development

To understand where AI fits in the software development lifecycle (SDLC), it’s useful to think in terms of depth of integration. Not every company is at the same level, and not every company needs to be at the highest one.

Level 1: Individual Productivity (The Copilot Tier)

AI lives in the developer’s code editor. The human has full control: AI suggests, humans decide.

What it does well: code autocompletion, generating functions from comments, explaining unfamiliar code, creating basic unit tests, writing initial documentation.

This is the level with the best benefit/risk ratio for getting started. The impact is immediate and errors are local — easy to catch before they reach the repository.

Level 2: Team Collaboration (The Peer Tier)

AI steps out of the developer’s machine and into the team’s shared space. Pull requests are reviewed by AI as well: it catches logical errors, summarizes changes, and suggests refactors.

Value: reduces the cognitive load on human reviewers and speeds up the time until code is approved and merged. Human reviewers can focus on high-level concerns rather than mechanical details.

Risk: if the team relies too heavily on AI review and relaxes their own, errors that AI misses — especially those requiring an understanding of business context — can slip through.

Level 3: Quality Control in the CI/CD Pipeline (The Gatekeeper Tier)

AI integrates as a step in the CI/CD pipeline. Here it decides whether something advances to the next environment or gets blocked.

Key capabilities:

Predictive testing: instead of running all tests every time, AI predicts which tests are most likely to fail given the specific change, and runs only those. Faster pipelines.
Intelligent quality gates: performance analysis to decide whether a build is fit to move to production.
Context-aware vulnerability detection: not just finding known vulnerabilities, but evaluating whether they’re exploitable in the specific context of the application.

Risk: this is the first level where AI can block or approve code without direct human intervention on each decision. The responsibility for configuring those gates correctly is entirely human.

Level 4: Operations and Observability (The SRE Tier)

AI monitors the system once it’s in production — what’s known as AIOps.

Capabilities:

Detecting traffic anomalies before the system collapses.
Self-healing: restarting services or triggering automatic rollback if it detects degradation after a deployment.
FinOps: automatically right-sizing servers to avoid cloud waste.

This level delivers real value in high-scale environments. In a startup with moderate traffic, the cost of implementing it exceeds the benefit.

Level 5: AI at the Core of the Product (The Engine Tier)

The team no longer just uses AI as a tool — AI is part of the product itself. This includes:

RAG (Retrieval-Augmented Generation): connecting a company’s internal knowledge base to a language model so it can answer questions about up-to-date internal information.
Fine-tuning: training a model on your own data to specialize it in your specific domain.
Agent orchestration: systems where one AI calls other AIs to autonomously solve complex tasks.

This is the most complex level, the most resource-intensive, and the one that can deliver the most differential value. It’s also the one most prone to going badly wrong if implemented without the right level of technical maturity.

What AI Does Well Today

These are the tasks where AI delivers real, measurable value in current software development:

Boilerplate and repetitive code generation: standard structures, configurations, project scaffolding. What used to take hours now takes minutes.

Automated tests: generating unit and integration tests from source code. AI is good at covering the obvious cases, though complex edge cases still require human judgment.

Documentation: generating technical documentation from code. Especially useful for legacy code that was never documented when it was written.

Architectural spikes: quick, throwaway experiments to evaluate a technical solution before committing to it. “Does Redis work better here than DynamoDB?” AI can generate both prototypes in hours for a human to compare.

Modernization and migrations: converting code from one framework version to another, updating dependencies, refactoring to follow modern patterns.

Explaining unfamiliar code: paste a complex code snippet and ask AI to explain what it does and why. Especially useful when joining an existing project.

Detecting known errors in code review: typical bugs, well-documented insecure patterns, style convention violations.

AI always increases speed. What varies is the quality of the result, and that difference depends almost entirely on whether human oversight exists:

🟢 Rising line: delivery speed. Always improves with greater AI integration.
🔴 Declining line: output quality without active human supervision. Falls as AI assumes more autonomy.
🔵 Stable line: quality with maintained human supervision. Holds at all levels. The crossover point between speed and quality without supervision is exactly where many projects start accumulating invisible technical debt.

What AI Cannot Do Today (and Why)

This is the most important section of the article, and also the most frequently ignored.

Reason 1: AI Has No Skin in the Game

When a production pipeline fails at 3 a.m. and the company is losing money by the minute, AI doesn’t feel the pressure. It won’t be fired. It has no legal or professional responsibility for the outcome.

The human architect does. And that difference isn’t trivial: accountability generates a kind of judgment that can’t be replicated with probabilities. Faced with a collapsing system, incomplete information, and limited time, AI offers suggestions based on statistical patterns. Humans apply judgment and bear the consequences.

Today, in practical and legal terms, you cannot hold an algorithm accountable when your infrastructure collapses.

Reason 2: The “Internet Average” Problem

AI is trained on what already exists on the internet. It’s extraordinarily good at replicating standard solutions. The problem is that the development cycle of a real company is never standard.

Every company has its own technical debt, specific tools that have been in production for decades, peculiar integrations, and configurations that took their current shape for reasons nobody fully remembers. AI typically proposes the “theoretically perfect” solution that, in practice, collides with inherited reality.

The human architect knows where the “buried bodies” are in the codebase. AI doesn’t. And it can’t, because that knowledge isn’t in any training dataset — it lives in the memory of the people who’ve spent years working on that system.

Reason 3: The Security Hallucination

In a CI/CD context, a single-character error in a configuration file can expose entire databases to the internet. AI makes mistakes: it suggests parameters that don’t exist, configurations that look logical but are insecure, or simply “invents” library APIs with signatures that don’t exist.

A security architect or a senior DevOps engineer catches that kind of error almost by instinct, built up from years of experience with real systems that have failed in very specific ways. AI doesn’t have that lived experience — it has text about those experiences, which is not the same thing.

Blindly trusting AI to design a system’s security is, right now, a high-risk bet.

Reason 4: The Broken Telephone Effect

For AI to produce a useful architecture, it needs precise and well-articulated context. And here’s the trap: if someone doesn’t already have sufficient domain expertise, they don’t know how to formulate the right prompt. The result is a generic architecture that doesn’t solve the actual problem.

The role of the architect or senior developer is necessary precisely for extracting complex business requirements and translating them into precise technical specifications. That ability — asking the right questions, understanding what the business needs even when the stakeholder can’t articulate it — is something AI cannot do on its own.

AI needs someone who already knows what they’re doing in order to work well.

The Junior Paradox: How Does the Next Generation Learn?

This is probably the most silent and most dangerous consequence of mass AI adoption in development.

In the pyramid of a traditional development team, junior profiles learn by doing the most mechanical tasks: writing supporting code, fixing simple bugs, implementing tests. It’s unglamorous work, but it’s the process through which the technical judgment that makes a senior developer is forged.

AI does these tasks 90% faster and at a fraction of the cost of a junior human. The consequence is that entry-level positions are disappearing or transforming. And if there’s nowhere to practice the fundamentals, there’s no way to become a senior.

The long-term risk: if there are no entry points to the field where people learn by doing real things, in ten years there will be a shortage of seniors who understand systems in depth. Today’s seniors were forged by doing things by hand. Tomorrow’s — where will they be forged?

This isn’t an argument against AI. It’s an argument for companies and the industry to think about how to preserve real learning pathways, even in an environment where AI automates the basic tasks.

Cognitive Technical Debt: The Invisible Risk

“Technical debt” is a well-known concept in the industry: code that works but is written in a way that will be hard to maintain, extend, or understand. It accumulates when speed is prioritized over quality.

AI introduces a new variant: cognitive technical debt. It’s not just that the code is poorly written. It’s that nobody in the company understands why it does what it does, because “AI generated it.”

It’s a more dangerous debt than traditional technical debt because:

It’s invisible: the code can work perfectly for months without anyone detecting the problem.
It’s explosive: when it fails, it fails in unexpected ways and nobody has the context to fix it.
It drives away talent: good professionals don’t want to work on systems nobody understands. It’s frustrating and feels like wasted time.

The classic mantra “if it ain’t broke, don’t fix it” was already dangerous when humans said it. Applied to AI-generated code that nobody fully comprehends, it’s outright suicidal.

The Scientific Evidence: The Cognitive “Boiling Frog” Effect

The above might seem like a reasonable intuition that’s hard to measure. In April 2026, a team of researchers from Carnegie Mellon, Oxford, MIT, and UCLA published a study that quantifies it under controlled conditions.

The paper “AI Assistance Reduces Persistence and Hurts Independent Performance” (Liu et al., 2026) worked with 1,222 participants across three randomized experiments. The design was simple: one group solved problems with access to an AI assistant (GPT-5), the control group solved them alone. Afterwards, AI access was removed from everyone and independent performance was measured.

The results:

Solution rate: the group that had used AI achieved 57% versus 73% in the control group. Using AI during the assistance phase reduced subsequent performance without it.
Problem abandonment: doubled in the AI-assisted group (20% vs. 11%).
The important nuance: participants who used AI only for hints — not complete answers — showed less degradation. The greatest damage came from using AI as a complete substitute for one’s own reasoning.

The authors call this the cognitive “boiling frog” effect: the erosion is silent, incremental, and happens in minutes. The brain rapidly adapts to the availability of external assistance and reduces its own cognitive investment, even after that assistance disappears.

It reminds me of something most of us have experienced: “Before, you knew everyone’s phone number by heart, but ever since smartphones arrived to store them, I can’t remember a single number.” Why invest effort when the tool (phone, AI, mechanical hammer, etc.) that you know exists and is within reach does it far better? Is it laziness, human survival instinct (minimum resource expenditure), or the brain’s rapid adaptation to its environment?

Applied to a development context: a team that habitually accepts AI output without reasoning about it doesn’t just accumulate cognitive debt in systems they don’t understand. They also actively degrade their capacity to think independently when AI is unavailable or when it fails.

Multi-Agent Systems: Promise vs. Current Reality

AI agent systems are those where multiple AI models collaborate with each other to solve complex tasks: one agent acts as architect, another as QA, another as DevOps — they communicate and supervise each other.

The promise is powerful: automate the entire technical execution.

The reality is more sobering:

The evolutionary “echo chamber” problem: an agent system with memory learns from what it does. If one agent makes a subtle error and the others accept it as the norm, the system starts building on a defective foundation. Without a human applying external common sense, agents can drift toward an architecture that nobody outside the system can understand or fix.

The creativity inflection point: agent systems are excellent at optimizing what already exists, but very poor at inventing new paradigms. A company merger, an unexpected regulatory change, a strategic pivot — AI processes these as data points, but doesn’t “feel” the urgency or ambiguity surrounding those decisions.

Ultimate accountability: if the agent system makes an architectural decision that violates a privacy regulation, who bears responsibility? The company. And for that, it needs humans who understand what the system decided and why.

Security, Governance, and Sandboxes: What Isn’t Optional

If there’s one thing that AI tool marketing doesn’t mention often enough, it’s security in the use of agents capable of executing code, modifying files, and accessing external systems.

The minimum enterprise standard for working with AI agents should include:

The fundamental rule: no AI agent should have direct access to production. The same safeguards applied to a junior developer who just joined (code reviews, test environments, approvals) must apply to AI systems — with even greater rigor.

The cost of ignoring this is not hypothetical: security incidents caused by misconfigured AI tools are already documented in the industry.

Probabilistic CI/CD: A Quietly Shifting Paradigm

The classic CI/CD pipeline is deterministic: the test passes or fails, the build compiles or it doesn’t. The result is binary and reproducible.

When AI enters the pipeline — especially in the form of quality evaluations or automatically generated tests — the system becomes probabilistic: AI may give slightly different answers on successive runs. What was “correct” yesterday may be evaluated differently today, not because the code changed, but because the AI model has inherent variability.

This forces a rethinking of how software is validated. It’s no longer enough to say “the test passed”; you have to ask “is the test still relevant and accurate?” What’s typically applied is an AI evaluation pipeline: not just checking whether the code compiles, but whether the AI is still precise and consistent with what it’s being asked to do.

It’s a small problem today, but it will grow as AI takes on more responsibilities in the pipeline.

The Real Competitive Advantage

After everything above, the conclusion isn’t that AI is bad or not worth using. It’s that the real competitive advantage doesn’t come from having the most powerful AI or using it in more parts of the process.

The real competitive advantage is having the most capable team at auditing, understanding, questioning, and improving what AI proposes.

A company with developers who understand in depth what AI does, who detect when it’s wrong, who know when to trust it and when not to, and who keep technical judgment alive within the team — that company has a real, sustainable advantage.

A company that simply connects everything to an AI and shuts down internal critical thinking is buying short-term speed at the price of structural fragility.

Sources cited in this article:

Liu G., Christian B., Dumbalska T., Bakker M.A., Dubey R. (2026). AI Assistance Reduces Persistence and Hurts Independent Performance. arXiv:2604.04721. Link to paper
Press coverage of the study: CMU — New Scientist coverage (PDF) · Time Magazine — “Are We Losing Our Minds to AI?” · Mobile Syrup
Recommended AI sandboxes: Docker AI Sandboxes · Modal Sandboxes
EU AI Act (regulatory reference): EUR-Lex — Regulation (EU) 2024/1689

Next article: The Real Cost of AI: From the Democratic Promise to the Enterprise Model

Previous article: The Real Interests of Each Stakeholder in Enterprise AI Adoption

2 thoughts on “What AI Can and Cannot Do Today in Software Development”

Carla Nitzsche says:

17/07/2026 at 13:51

The article’s warning about AI‑driven quality gates in CI/CD made me wonder how firms balance speed with the risk of hidden bugs—does integrating AI into automated pipelines require dedicated process‑automation expertise?

Responder
1. Ramón Invarato says:
  
  17/07/2026 at 21:06
  
  That’s a fundamental question. The short answer is yes: integrating AI into automated pipelines absolutely requires dedicated process-automation and DevOps expertise, perhaps even more than before.
  
  Responder