Skip to main content
Back to blog
6 min read

The state of AI in 2026: what changed and what did not

2026 was a year of real progress in AI capability and significant noise about what that progress means. Here is an honest accounting of what actually shifted, what stayed stuck, and what that implies for the year ahead.

By Ramiro Enriquez

Every year-end AI summary risks two failure modes: excessive optimism that reads the latest capability demonstrations as evidence of imminent transformation, or excessive skepticism that dismisses real progress because the most ambitious predictions did not pan out. The honest version is more granular. Some things changed significantly. Others did not change at all despite years of confident prediction. And some things changed in ways that were not predicted but matter more than the things that were.

Here is an attempt at the honest version for 2026.

What actually changed

Inference cost continued its decline. The cost per token of running frontier models dropped substantially again, continuing a trend that has been more consistent than almost any other dynamic in AI development. This matters because cost was a real constraint on production AI deployment at scale, and that constraint has loosened. Applications that were economically marginal in 2024 are economically straightforward in 2026. The cost curve has enabled a category of high-volume, lower-margin AI applications that could not exist before.

Agentic capability got meaningfully more reliable. The earliest wave of AI agents, circa 2023 and 2024, were impressive in demos and unreliable in production. Error rates that were acceptable in a single-step application compound badly when an agent is taking ten or twenty sequential actions. In 2026, the reliability picture improved enough that a class of genuinely autonomous AI workflows entered production at organizations that can tolerate some error rate. These are not the full-autonomy AGI-adjacent systems that generated the most coverage. They are narrow, bounded agents handling specific workflows where the task structure is predictable and recovery from errors is possible. The bar for what counts as reliable enough is still domain-specific, but it moved.

The evaluation ecosystem matured. In 2024 and 2025, most AI evaluation was ad-hoc: custom scripts, manual review, benchmarks that did not reflect production conditions. In 2026, a more systematic approach emerged. More teams built evaluation infrastructure as a first-class engineering concern. Better tooling made it easier to run structured evaluations against labeled datasets and monitor quality in production. This is unglamorous progress that does not generate headlines, but it is probably the highest-leverage shift in AI development practice that happened this year. Teams that can measure quality reliably can improve systematically; teams that cannot are guessing.

Multimodal capability became practically useful. The ability to process images, audio, and documents alongside text moved from a research capability to something that ships in production applications. Document processing, in particular, saw significant real-world deployment. The ability to extract structured information from heterogeneous documents at scale is solving problems that previously required expensive human processing or brittle rule-based systems.

What did not change

Enterprise adoption pace remains slower than headlines suggest. The press releases about AI transformation continued at the same rate as prior years. The actual rate of AI systems reaching production in large enterprises continued to lag significantly behind the announcement rate. The gap is not primarily a technology problem. It is an organizational problem: risk tolerance, evaluation capacity, data quality, change management, and the difficulty of getting production AI right all constrain adoption more than model capability does. 2026 did not fix these organizational constraints, and 2027 will not either.

The data quality problem is still the bottleneck. Almost every team that tried to build AI on proprietary data hit data quality as the primary constraint. The data was incomplete, inconsistently formatted, or labeled with schemas that do not reflect the real structure of the problem. AI capability is no longer the bottleneck for most enterprise AI projects. Data readiness is. This has been true since at least 2024 and is arguably more true now that model capability has advanced beyond what data quality can support.

Hallucination is not solved. The major model providers made meaningful progress on reducing factual errors in specific, measurable ways. Hallucination rates on popular benchmarks dropped. Production hallucination rates in deployment, on the specific types of claims that matter for real applications, remained a live problem. The progress was real but narrower than coverage implied. The applications where hallucination is a serious risk still require mitigation architecture, not just better models.

The evaluation problem is not solved either. The ecosystem matured, but maturation is not solution. Measuring whether an AI system is doing what you want it to do remains hard, expensive, and underinvested relative to how much it matters. Most organizations are still not measuring quality systematically. The teams that are measuring it are doing something closer to right, but the practice is not yet standard.

What changed in unexpected ways

The competitive landscape consolidated faster than predicted. In 2024, there were credible arguments for a dozen different foundation model providers as long-term players. By the end of 2026, the dynamics have sorted in ways that look more like enterprise software markets than the open landscape of two years ago: a small number of providers with very large compute investments and distribution advantages, a layer of specialized providers for specific domains or modalities, and a much larger layer of application companies building on top. The middle layer is under pressure.

Open-weight models became a genuine alternative for more use cases. The capability gap between closed frontier models and the best open-weight alternatives narrowed substantially. For applications that do not require the absolute frontier, where data privacy or latency constraints make API calls problematic, or where the economics of high-volume inference favor on-premise deployment, open-weight models became a realistic option rather than a compromise. This changed the build-vs-buy calculus for a significant category of AI applications.

Regulation moved faster than the AI industry expected. The regulatory environment for AI in Europe, and to a lesser extent other jurisdictions, became a real operational constraint rather than a theoretical future concern. Legal teams are now involved in AI deployment decisions at large enterprises in ways they were not two years ago. This slows some deployments and forces documentation practices that the field had not standardized. It also, notably, is accelerating evaluation and auditing practices because those practices are becoming compliance requirements rather than engineering aspirations.

What 2027 is likely to bring

The things that were true going into 2026 and are still true going into 2027: the organizations getting the most value from AI are the ones with the best evaluation infrastructure, the cleanest data, and the most disciplined approach to scoping AI to tasks where it is reliable. None of that will change.

What may change: inference costs will continue falling, enabling another tier of applications. Agent reliability will continue improving, enabling another set of workflows. The regulatory requirements will become clearer, which will be worse news for teams that have not built auditability into their systems and better news for teams that have.

The organizations that have spent 2026 building evaluation infrastructure, cleaning data, and deploying narrow AI reliably will be in a stronger position to take advantage of the next tier of capability. The organizations that spent 2026 running pilots and looking for transformative use cases are entering 2027 roughly where they were entering 2026, except with a year of organizational debt around AI expectations that did not materialize.

The honest summary is that 2026 was a year of real but incremental progress, significant organizational learning, and continued evidence that the gap between AI capability and AI deployment is not primarily a technology gap. The technology keeps improving. The organizational capability to deploy it well is what is still catching up.

Zylver ships AI products: Forge, Signal, Agents, Flows, and Meter. View all products.

Get insights like this delivered monthly.

No spam. Unsubscribe anytime.