The AI skeptic's guide to getting value anyway

If you have been around long enough to watch a few technology hype cycles, you have developed a productive instinct: when something is described as transformative, revolutionary, or a once-in-a-generation shift, be skeptical. The technology usually works less well than claimed, for fewer use cases than promoted, and the adoption curve is longer and harder than the boosters suggest.

This instinct is correct more often than not, and it is correct in specific ways about AI. A large fraction of what gets claimed about AI does not hold up under scrutiny. The demos that look impressive often rely on cherry-picked inputs. The accuracy numbers often omit the distribution of failures. The productivity claims often fail to account for the overhead of reviewing, correcting, and managing AI output. The “AI will do X” claims often mean “AI will do part of X in favorable conditions.”

Where the skeptic’s instinct runs into trouble is in the next step: concluding that because most AI claims are overblown, AI is not worth serious attention. This conclusion is also wrong, and it leads to missing the specific applications that genuinely deliver value.

What skeptics get right

Skeptics are right that most AI projects underdeliver. The combination of high expectations, poor evaluation discipline, and the tendency to showcase successful cases while ignoring failures produces a systematic bias toward overestimating AI’s value. Skeptics who resist this bias make better decisions.

Skeptics are right that AI is genuinely hard to evaluate. The output looks confident and coherent whether it is correct or not. The failure modes are unfamiliar. The quality varies by input in ways that are not obvious from aggregate metrics. Organizations that treat AI as they would traditional software, where “it works” is a binary, consistently discover problems after deployment that careful evaluation would have caught.

Skeptics are right that the organizational overhead of AI is often underestimated. Reviewing AI output takes time. Maintaining the integration takes time. Handling the cases where the AI is wrong takes time. The net productivity gain after accounting for these costs is often smaller than the gross productivity gain from having the AI do the first pass.

Skeptics are right that vendor demos are not representative. The examples in demos are selected to showcase the AI’s capabilities. The inputs that cause problems are not shown. The evaluation criteria are defined to make the AI look good. A demo that impresses you is evidence that the vendor knows what their AI does well, not evidence that it will work well for your use case.

Where wholesale skepticism goes wrong

Skeptics who use correct observations about AI’s limitations to justify ignoring AI entirely are making a different mistake.

The correct observation that most AI projects underdeliver does not imply that all AI projects underdeliver. Some specific applications produce reliable, measurable value. The question is which ones, not whether any exist.

The correct observation that AI is hard to evaluate does not imply that evaluation is impossible. It implies that evaluation requires more care than most teams apply, and that teams that invest in careful evaluation can find out whether a specific application works. The skeptic who will not evaluate because evaluation is hard is avoiding the one activity that would tell them whether their skepticism is warranted in a specific case.

The correct observation that vendor demos are not representative does not imply that the vendor’s tool is worthless. It implies that the demo is not the right basis for a purchase decision. Testing with your data, on your use cases, against your quality criteria, is the right basis. The skeptic who dismisses a tool based on a demo that was oversold is making the same epistemic error as the enthusiast who purchases based on the same demo.

The narrow slice that actually works

The AI applications that reliably deliver value across a wide range of contexts share a few characteristics that are worth knowing about.

Well-defined tasks with clear quality criteria. AI works well when the task is specific enough that you can evaluate whether the output is good. Summarizing a structured document. Classifying inputs into a defined set of categories. Generating a first draft of a form-based document. Extracting structured information from unstructured text. These tasks have clear enough quality criteria that you can measure whether the AI is doing them well.

High volume, low stakes per instance. AI is most reliably valuable when you have many instances of the same task and the cost of an individual error is low or correctable. Drafting many similar emails, classifying many similar support tickets, reviewing many similar documents for a specific feature: when errors are recoverable and volume is high, the value of AI assistance compounds while the risk stays contained.

Tasks where a reasonable first draft has high value. AI is often best used not to complete a task but to create a starting point that a human can review and improve. A first draft that needs editing is more valuable than a blank page. A suggested classification that needs confirmation is more valuable than deciding from scratch. Tasks where the hard part is getting started are well-suited to AI assistance.

Tasks where consistency matters more than perfection. AI produces consistent outputs in ways that humans, who vary based on attention, fatigue, and context, do not. For tasks where it is more important that the output be consistent than that it be optimal in each case, AI has an advantage that is independent of its raw quality.

The skeptic’s practical approach

If you are skeptical about AI and want to find out whether there is specific value for your context without buying into hype, there is a tractable approach.

Define success before you start. Before you experiment with any AI tool or application, write down what success looks like in terms you can measure. Not “the AI helps people work faster” but “the time to complete this task decreases by at least 20% without an increase in error rate.” Defining success upfront prevents the post-hoc rationalization that consistently makes AI seem more valuable than it is in practice.

Test with your actual inputs. Do not evaluate AI based on the vendor’s examples or on examples you curated to make the AI look good. Collect a representative sample of your real inputs, including the edge cases and the hard ones, and test against those. The performance on representative inputs is what you will actually experience.

Measure the full cost. Count the time spent reviewing AI output, correcting errors, and handling the cases where the AI was wrong. Compare the total time with and without AI to get an honest productivity comparison. Many AI applications look like significant time savers when you measure only the AI’s contribution and look like marginal improvements when you account for the total workflow.

Start with the tasks you are most confident about. The AI skeptic’s mistake is often all-or-nothing thinking: either AI is transformative or it is not worth touching. A more productive approach is to find the single task in your workflow where AI most clearly fits the characteristics above, test it carefully, measure honestly, and decide based on results. If it works, expand; if it does not, you have learned something specific.

The skeptic’s advantage

One thing the skeptic has that the enthusiast does not is a better default for evaluation. The enthusiast wants the AI to work and is prone to accepting evidence that it is working without scrutiny. The skeptic assumes the AI does not work and requires evidence to the contrary.

This is the right prior for AI applications. Most AI applications do not work as well as they initially appear to. The skeptic’s prior pushes toward the careful evaluation that produces honest answers. The enthusiast’s prior pushes toward premature deployment of applications that work well in demos and less well in production.

The most effective practitioners of AI adoption are often people who started as skeptics and updated their views based on specific evidence from careful evaluation. They maintained their skepticism about AI broadly while finding specific applications where the evidence compelled a different conclusion. They ask harder questions than enthusiasts. They measure more carefully. They are more willing to conclude that an application is not worth continuing when the evidence does not support it.

Skepticism about AI is a starting point, not a conclusion. The conclusion should be based on evidence from careful evaluation. For some specific applications in your context, that evidence will support adoption. For many others, it will support continued skepticism. Both conclusions are useful. Only the careful evaluation can tell you which is which.

The AI skeptic's guide to getting value anyway

What skeptics get right

Where wholesale skepticism goes wrong

The narrow slice that actually works

The skeptic’s practical approach

The skeptic’s advantage

More from Zylver

What your board needs to know about AI

How AI is changing customer service

How to scale AI adoption from one team to the whole organization