The AI vendor due diligence checklist
Buying AI software is different from buying traditional software. The evaluation criteria are different, the failure modes are different, and the questions vendors are used to answering are not always the ones that matter most. A practical guide to what to ask and how to verify the answers.
By Ramiro Enriquez
Buying AI software requires a different evaluation process than buying traditional software. Traditional software either does what it is specified to do or it does not. AI software works on a distribution: it performs well on some inputs and less well on others, quality varies in ways that are not always predictable, and the system can degrade over time as models change or input distributions shift.
The due diligence questions that matter for AI vendors are not the same as the ones that matter for traditional vendors. Many buyers find this out after signing, when the discrepancies between vendor claims and actual performance become apparent in production.
This checklist is organized around the dimensions that distinguish good AI vendor evaluation from superficial evaluation. Not every question applies to every purchase, but buyers who work through these categories consistently make better purchase decisions than those who evaluate primarily through demos and reference calls.
Model and capability transparency
What model or models does the product use? Some vendors are transparent about which foundation models power their product; others treat this as proprietary. The answer matters because model choice affects capability, latency, cost, and data handling.
Is the model proprietary, fine-tuned, or a direct API integration? Products built on top of foundation model APIs via thin wrappers have different risk profiles than products with proprietary fine-tuned models. The wrapper approach is faster to build and faster to update; the fine-tuned approach offers more capability control and customization potential.
How does the product handle model updates? Foundation model providers update their models, sometimes in ways that change behavior. Ask how the vendor manages these updates: Do they test new model versions before deploying? Do customers have advance notice of model changes? Can customers pin to a specific model version?
What happens if your AI provider changes pricing or availability? If the product depends on a single foundation model API, the vendor’s economics and availability are tied to that provider. Ask about contingency planning and whether the architecture supports model switching.
Quality and evaluation
How does the vendor measure quality? Every AI vendor claims their product produces high-quality outputs. The question is how they measure it. Ask for the specific quality metrics they track, how those metrics are defined, and what the current values are.
Can you see quality metrics segmented by input type? Aggregate quality metrics hide the distribution. A system that produces excellent outputs for 80% of inputs and poor outputs for 20% looks good in aggregate. Ask to see quality metrics segmented by input characteristics relevant to your use case.
What is the evaluation methodology for the vendor’s accuracy or quality claims? Ask how the numbers in the vendor’s marketing materials were produced. What dataset was used? Was it a vendor-curated dataset or a representative sample of real customer inputs? Were the inputs selected to make the AI look good?
Will you run an evaluation on your data before purchasing? Most reputable AI vendors will support a proof-of-concept evaluation using your actual data. Vendors who resist this are often protecting claims that would not hold up under realistic evaluation. An evaluation on your own data is the single most important step in AI vendor due diligence.
How do you define and measure output quality for our specific use case? Quality means different things for different use cases. Make the vendor articulate what quality means for your use case specifically and how they measure it. Generic quality claims are not useful; specific, measurable criteria are.
Data handling and privacy
Where does our data go? This should be a specific, contractual answer: which systems process the data, in which regions, and under what access controls. “We take data privacy seriously” is not an answer.
Is our data used to train models? Many AI vendors, including foundation model providers, use customer data for training by default. Know whether this applies to your data and whether it can be opted out of contractually.
Who has access to our inputs and outputs? Customer support teams, model safety teams, and third-party providers may have access to data that passes through an AI system. Get a specific answer about who can access what.
How is data retained and for how long? AI systems often log inputs and outputs for quality monitoring and debugging. Understand the retention policy and ensure it aligns with your data governance requirements.
What certifications does the vendor hold? SOC 2 Type II, ISO 27001, HIPAA BAA (if applicable), GDPR compliance documentation. Request the actual audit reports, not just claims of compliance.
Reliability and operations
What is the SLA for availability? AI inference systems can have availability and latency characteristics that differ from traditional software. Get a specific SLA that includes both availability and latency percentiles, with remedies for breach.
What is the p95 and p99 latency? Average latency is not representative. Tail latency is what users experience in the worst cases and what affects the reliability of workflows that depend on the AI. Ask for percentile latency data from production.
How does the system degrade under load? AI inference systems that are at capacity can degrade in ways traditional systems do not: slower responses, lower quality, or outright failures. Understand the degradation behavior and what the vendor does to manage capacity.
What is the incident history for the last 12 months? Ask for the vendor’s incident history, including outages, quality degradations, and data incidents. How many incidents were there? How long did they last? How were customers notified?
What is the rollback or contingency plan if the AI produces bad outputs at scale? AI systems can have quality failures that are systemic rather than individual. If the AI starts producing consistently bad outputs for all users, what is the mechanism for rolling back or switching off that behavior?
Roadmap and stability
Is the product a core business or a feature? AI capabilities added to existing products are sometimes maintained with less investment than core products. Understand whether the AI capability you are buying is central to the vendor’s business or a feature on the roadmap of something else.
What is the vendor’s financial position? The AI vendor landscape is in a consolidation period. Vendors that seem strong today may be acquired, pivoted, or shut down. Understand the vendor’s funding situation, revenue trajectory, and strategic position well enough to assess whether they will exist in their current form three years from now.
What is the model update cadence and how are customers involved? If the vendor improves or changes their model, does your use case get retested? Are customers notified of changes that might affect their workflows? Is there a customer advisory process for significant changes?
What happens to our data if the vendor shuts down or is acquired? Know the contractual answer to this question before signing, not after.
Pricing and total cost
What is the pricing model for AI usage? Seat-based, consumption-based, tiered usage: the pricing model has significant implications for cost predictability and total cost of ownership as usage grows.
What is the cost at 2x and 5x our current expected volume? AI vendors sometimes have pricing that looks reasonable at entry volume and becomes very expensive at scale. Model the cost at higher volumes before committing.
Are there costs not in the base price? Overage charges, premium support, additional storage for logs, compliance features, API access: understand the full cost picture.
What are the contract terms for price changes? Some vendors include price protection in multi-year contracts; others reserve the right to reprice. Know what you are agreeing to.
References and validation
Can you speak with customers in our industry using the product for our use case? Vendor-provided references are selected to be positive. Ask specifically for customers in your industry using the product for a use case similar to yours. The overlap in industry and use case is what makes the reference useful.
Are there case studies with specific, measurable outcomes? “Our AI helped Company X improve their workflow” is not useful. “Our AI reduced Company X’s time to complete [specific task] by [specific percentage] as measured over [specific time period]” is useful. Ask for the specifics.
What is the vendor’s approach when the product does not meet expectations? How they answer this question is as important as what they say. Vendors who have a clear process for diagnosing and resolving quality problems are more reliable partners than those who attribute all problems to customer implementation.
Putting it together
The goal of AI vendor due diligence is not to find a vendor with no weaknesses. It is to understand the actual trade-offs of each option clearly enough to make a good decision. Every AI vendor has limitations; the ones who are transparent about them are more trustworthy than those who paper over them.
The most important step remains running an evaluation on your own data before committing. The questions above help you understand the vendor’s claims and risk profile; your own evaluation tells you whether those claims hold for your specific use case. Taken together, they give you the information to make a purchase decision you can defend and a baseline against which to measure actual performance after deployment.
Zylver ships AI products: Forge, Signal, Agents, Flows, and Meter. View all products.
More from Zylver
What your board needs to know about AI
Boards are being asked to provide oversight on AI at a moment when most board members lack the background to evaluate what they are hearing. The gap between what boards need to know and what they typically get in management presentations is real and consequential.
How AI is changing customer service
Customer service is one of the business functions most visibly transformed by AI. The changes are happening faster than most organizations planned for, and the outcomes depend heavily on implementation decisions that are easy to get wrong.
How to scale AI adoption from one team to the whole organization
Getting AI to work in one team is a different challenge from scaling it across an organization. What worked for the first team often fails when applied elsewhere, and the failure mode is usually invisible until the expansion is already stalled.
Get insights like this delivered monthly.
No spam. Unsubscribe anytime.