Skip to content
ZENITH_LABS
All Insights
TECHNOLOGY FOR BUSINESS6 min read

How to Evaluate an AI Development Agency — 8 Questions to Ask Before You Sign

The questions that separate legitimate AI engineering consultancies from agencies that will take your budget and deliver a GPT wrapper. A practical buyer's guide from someone on the other side of the table.

By IEEE-published AI researcher & founder of Zenith Labs

TL;DR

  • Most 'AI agencies' are prompt engineers or web developers who added 'AI' to their service list in 2023. Genuine AI engineering capability is identifiable — published research, production deployments, specific model architectures worked with.
  • The most telling question is not 'can you build X?' — every agency will say yes. It's 'show me a system you built, what went wrong, and how you fixed it.' Process under pressure reveals competence.
  • Fixed-scope contracts with clear deliverables, defined success metrics, and IP transfer clauses are the structure that protects you. If an agency resists any of these, walk away.

01Question 1: Can You Show Me Production AI Systems You've Built?

The distinction between an agency that has built real AI systems and one that has assembled demos is visible in the specifics. Ask for case studies that include: the data volume and format the system handled, the accuracy metrics it achieved on a held-out test set, the infrastructure it runs on in production, and what went wrong during development and how it was resolved.

Agencies that have done real work can answer these questions in granular detail — specific model architectures, specific failure modes, specific trade-offs made. Agencies that have built demos will describe outcomes in general terms and deflect to testimonials and logos. If you cannot get a specific answer to 'what accuracy did the model achieve on your validation set and how did you measure it,' you are talking to the wrong firm.

02Question 2: Who Actually Does the Work?

The 'agency model' in AI consulting often works as follows: a senior engineer makes the sale, the proposal is written by that same engineer, and the delivery is handed to a team of junior developers following a template. Ask directly: who will be building the system? What is their background? Can you speak with them before signing?

The answer you want is: the person who scoped the project is the person who builds it, or is directly supervising the build. AI engineering requires deep contextual knowledge of the problem that does not transfer well through handoffs. A system designed by one person and built by another typically inherits all the design assumptions without the design judgement needed to handle the cases those assumptions missed.

03Question 3: How Do You Define and Measure Success?

Before any contract is signed, success metrics should be agreed in writing. Not 'the system will classify documents accurately' — but 'the system will achieve ≥92% precision and ≥88% recall on a held-out test set of 500 documents sampled from your production data, evaluated before handover.' Vague success criteria are an indicator that the agency is not confident in its ability to deliver specific, measurable outcomes.

Ask how they will evaluate the system before delivery. Ask what happens if the system does not meet the agreed metrics. Ask what test data they will use and whether you can contribute to defining it. The answers tell you whether you are dealing with engineers who have shipped production systems or consultants who are confident they can figure it out.

04Questions 4–8: Contract, IP, and Ongoing Relationship

Question 4: Is the contract fixed-scope or time-and-materials? Fixed-scope contracts with a defined deliverable protect you from scope creep and budget overruns. Time-and-materials billing incentivises the agency to take longer.

Question 5: Who owns the code and model weights after delivery? The answer should be unambiguous: you do. Any arrangement where the agency retains rights to the model or requires ongoing licensing for you to use your own system is a red flag.

Question 6: What does maintenance look like post-delivery? AI systems require monitoring — model drift, data distribution shifts, and edge cases that were not in the training set emerge over time. Ask specifically what the handover package includes: documentation, retraining instructions, monitoring dashboards, and a defined support period.

Question 7: Can you speak to a previous client whose project was similar to yours? Reference checks are standard in procurement for a reason. An agency confident in its work will facilitate this without resistance.

Question 8: What do you not do? The clearest signal of a competent specialist is a well-defined scope of what they decline. Agencies that claim to do everything — AI, mobile, web, blockchain, cybersecurity, marketing — are generalists reselling commodity services. Specialists who say 'we don't do X because it's not our core' are telling you they take quality seriously enough to stay in their lane.

Hire AI AgencyAI ConsultantDue DiligenceAI DevelopmentHow to Choose

Apply This to Your Business

Ready to put this into practice?

Every engagement starts with a structured discovery session. No commitment required.

Start a project