The AI skills gap is not what you think it is

The AI skills gap has become a standard piece of business press: companies cannot find enough machine learning engineers, data scientists are in short supply, and organizations that cannot hire AI talent will fall behind. This framing shapes hiring strategies, training budgets, and acquisition decisions at companies across every sector.

It is also, for most organizations, the wrong diagnosis.

The skills that are actually scarce are not the ones that show up in job postings. The shortage that matters most for AI adoption is not ML engineering expertise. It is something less visible and harder to hire for: the ability to judge whether an AI system is producing good output, and the organizational capacity to define what “good” means before deploying.

What companies think they need

When organizations identify an AI skills gap, they typically describe it in terms of specific technical capabilities: deep learning expertise, Python fluency, experience with large language model fine-tuning, knowledge of vector databases. These are real skills. They are also concentrated in a small population, expensive to hire, and genuinely in demand.

But the assumption embedded in this framing is that the bottleneck to AI adoption is building AI systems. For most organizations, that is not the bottleneck. Foundation models are available as APIs. Infrastructure tooling has matured. Building a capable AI system requires less specialized expertise than it did three years ago, and that gap continues to close.

The bottleneck for most organizations is not building AI systems. It is knowing what to build, knowing whether what they built is working, and knowing how to integrate AI outputs into processes that were designed around human judgment.

The actual scarce skill: evaluation

The single most scarce skill in AI adoption is the ability to evaluate AI output in a specific domain.

Evaluation sounds like a technical problem. It is partly a technical problem: you need infrastructure for running evals, logging outputs, sampling from production, and measuring quality at scale. But the hard part of evaluation is not technical. It is knowing what good output looks like in a specific context.

A legal team deploying an AI contract review tool needs to evaluate whether the AI is correctly identifying risk clauses. That judgment requires legal expertise, not ML expertise. A hospital using AI to flag abnormal lab results needs to evaluate whether the flagging threshold is calibrated correctly. That judgment requires clinical expertise. A customer support team using AI to draft responses needs to evaluate whether the drafts match the company’s voice and avoid committing to things the company cannot deliver. That judgment requires customer expertise.

None of these evaluation tasks require a data scientist. All of them require a domain expert who has been given the right framework for thinking about AI output quality. The shortage is not people who can build evaluation infrastructure. It is domain experts who have been trained to think systematically about AI failure modes in their area.

Organizations that invest in building this evaluation capacity across their domain experts get dramatically better results than organizations that try to centralize AI quality assurance in a small technical team. The technical team cannot evaluate domain-specific outputs at the rate they are being produced. The domain experts are already in the organization; they need a framework, not a credential.

The organizational gap

Beyond individual skills, the more common failure mode is an organizational capability gap: companies have the technical talent to build AI systems, but lack the processes to use AI outputs effectively.

This shows up in a predictable pattern. A team builds an AI feature. It works well in testing. It is deployed to production. Within weeks or months, there are complaints: the AI is giving wrong information, users are ignoring its suggestions, the feature is not producing the expected business outcome. The technical team investigates and finds the model is performing roughly as expected. The problem is not the model; it is the organizational context around the model.

No one defined what success looked like before launch. No process exists for capturing feedback when the AI produces bad output. The people using the AI outputs were not involved in designing the system and have not been given guidance on when to trust the AI and when to override it. There is no feedback loop from production quality back to the team that built the system.

These are not technical problems. They are organizational problems. Hiring another ML engineer does not fix them. Fixing them requires product management practices for AI systems, change management for teams whose workflows now involve AI outputs, and operational processes for monitoring AI quality in production.

Where ML expertise actually matters

None of this means ML expertise is irrelevant. For organizations building custom models, doing fine-tuning on proprietary data, or operating AI infrastructure at significant scale, deep technical expertise is essential and genuinely scarce.

The distinction is between organizations building AI capabilities and organizations deploying AI capabilities. Most organizations are in the second category. They are buying AI through APIs, integrating foundation models into their products and processes, and evaluating whether commercial AI systems meet their needs. For these organizations, the relevant expertise is integration, evaluation, and domain judgment. ML expertise is useful at the margins but is not the constraint.

Even for organizations building AI capabilities, the bottleneck is often not the ML engineering itself. It is data quality: the ability to identify, label, and maintain the training data that makes a model useful for a specific application. Data quality work requires domain knowledge more than it requires ML knowledge. Organizations that recognize this hire annotation teams with domain expertise, invest in data validation pipelines, and treat data quality as a continuous operational concern rather than a one-time project.

The second-order mistake

The skills gap narrative has a second-order effect that compounds the first. When companies diagnose their AI problem as a talent shortage, they respond by hiring. When hiring does not solve the problem quickly enough, they conclude that the talent shortage is more severe than they thought, and hire more aggressively.

What they often discover, after significant investment, is that the new hires are spending most of their time on problems that cannot be solved with ML expertise: getting stakeholder alignment on what the AI should do, building data pipelines for information that exists in formats the AI cannot use, defining success metrics that the business will accept, and integrating AI outputs into processes that were not designed for them.

These are problems that domain experts, product managers, and process engineers would have been better positioned to solve. The AI talent the company hired is underutilized or applied to problems that are not the actual constraint.

A more useful frame

A more useful frame for diagnosing AI readiness is to ask three questions before deciding what skills to hire for.

First, do we know what good output looks like? If the answer is no, the skill needed is domain analysis: working with domain experts to define quality criteria, failure modes, and acceptable error rates for specific AI applications. This is a product and domain question, not an engineering question.

Second, do we have a feedback loop from production to development? If the answer is no, the skill needed is operational: building monitoring, sampling, and feedback infrastructure that tells the team whether the AI is performing as expected after deployment. This requires engineering skills, but it is closer to platform engineering than ML engineering.

Third, have we changed the processes that interact with AI outputs? If the answer is no, the skill needed is organizational: change management, training, and workflow redesign for teams whose work now involves AI. This is not an engineering skill at all.

Most organizations that are struggling with AI adoption can answer no to at least two of these three questions. The gap is not ML engineering talent. It is the surrounding infrastructure of judgment, feedback, and process that makes AI systems work in practice rather than in demos.

The organizations that are getting real value from AI are not necessarily the ones with the most ML engineers. They are the ones that have figured out what they are trying to accomplish, built mechanisms for knowing whether they are accomplishing it, and changed how people work to account for the AI’s role. Those capabilities are teachable. They are often already present in the organization in latent form. And they are more reliably the bottleneck than the engineering talent that gets all the attention.

The AI skills gap is not what you think it is

What companies think they need

The actual scarce skill: evaluation

The organizational gap

Where ML expertise actually matters

The second-order mistake

A more useful frame

More from Zylver

What your board needs to know about AI

How AI is changing customer service

How to scale AI adoption from one team to the whole organization