Back to Blog

What you're actually paying for in an AI project

Pricing AI Systems
What you're actually paying for in an AI project

When someone asks me what a production AI project costs, the honest answer is that the question is pointed at the wrong thing. The cost of the AI part of an AI system is almost never where the budget goes, and teams that quote based on “we need AI” and not “we need a pipeline around an AI call” end up surprised in both directions — sometimes by how cheap the model is, more often by how much engineering sits around it before any of it can be called production-ready.

This post is about what you’re actually paying for when you build something like an extraction pipeline, a classification service, or an automated document workflow. I’m writing it from the perspective of someone who ships these systems — in Romania and for clients elsewhere — and who has watched teams underestimate the same 80% of the work project after project.

The three layers of work

There are three layers in any production AI system, and they are not equal in effort.

AI integration is the layer people think they’re buying. It’s connecting to a model API, structuring prompts, handling responses, managing context windows, and dealing with rate limits and retries. It is real work, and it has to be done correctly, but it is not where most of the time goes.

Pipeline engineering is the layer people don’t think about until their prototype has been in production for a week. This is validation before and after the model call, queueing so that work survives a restart, error handling that distinguishes recoverable failures from permanent ones, retries with backoff, dead letter queues for work that can’t be processed, logging with enough context to debug after the fact, and observability so you can see what the system is doing without reading log files. Most AI prototypes fail in production because this layer is either thin or missing.

System integration is connecting the pipeline to everything else you already have — authentication, webhooks, databases, file storage, monitoring, and deployment. The surface area here depends entirely on how many systems the pipeline has to touch, and each integration adds its own authentication, error handling, and monitoring surface.

Rough shape of the work: the AI integration is about twenty percent, and the other eighty percent is the engineering that makes it reliable.

Engagement types that can actually be priced

Fixed-scope pricing is only possible when the work is well-defined, and the engagement types I run all require the scope to be clear before the work starts. If the scope is not clear yet, the first engagement is defining the scope — not building the system on a guess and hoping the budget holds.

Architecture review and hardening is a written assessment of an existing AI system with specific fixes for the reliability gaps. You send the repository, diagrams if you have them, and access to a running instance. I trace the pipeline end to end, map where AI is doing work deterministic code should be doing, list every failure mode and whether it’s handled, and deliver a prioritized list of fixes ordered by risk. This fits teams who have a prototype they want to take to real users, or a system in production that they’re not confident handles failure correctly.

End-to-end AI pipeline is a production system where AI does one specific thing — extract fields from unstructured input, classify content, make a routing decision, generate a structured document — inside a deterministic pipeline that handles everything else. The design includes the failure modes upfront, the implementation covers validation, retries, logging, and observability, the integration with your existing systems is part of the scope, and you get deployment scripts and documentation at the end. This is the right shape for projects with a clear input and a clear output where AI is doing one well-defined job.

System integration and automation is connecting disparate systems with real-time data flows when the main challenge is getting the pieces to talk to each other reliably. API clients, webhooks, data transformation layers, error handling, monitoring, and alerting. AI may or may not be involved, depending on whether any of the decisions require judgment the way classification does.

What actually moves the price

The scope clarity comes first, and everything else compounds on top of it. If you can describe the input, the output, and the success criteria in two sentences, the engagement can be priced. If you’re still figuring out what problem you’re solving, it can’t — not honestly.

After that, integration surface area is the biggest variable. Connecting to two external APIs costs materially less than connecting to eight, because each integration adds authentication, error handling, retry logic, and monitoring of its own. Data complexity is the next variable: parsing a well-structured JSON payload is simpler than extracting fields from scanned PDFs with inconsistent layouts, and the more variance in input, the more validation and edge-case handling the pipeline needs to do. And finally, reliability requirements matter — a prototype you’ll demo once has different engineering than a system handling customer-facing traffic around the clock, because production systems need observability, retries, dead letter queues, and deployment automation that a demo doesn’t.

What isn’t included

Fixed-scope pricing covers the work of building the system, and a few things it does not cover. Hosting and infrastructure are yours to run — you own the code and deployment scripts, and the ongoing runtime cost is on you. Model API costs are pay-per-use through the provider directly, and although the system is built to minimize calls, the Claude or Deepgram or OpenAI bill is between you and them. Feature expansion after delivery is a separate engagement: fixed scope means the system does what was agreed, and adding new workflows later is a new scope. Training and change management is documentation and a walkthrough by default, and if your team needs dedicated onboarding sessions that’s priced separately.

Why fixed scope works here

Most AI consulting is billed hourly because the work is exploratory — the team is paying to figure out what’s possible. Fixed scope works when you already know what needs to happen: the input is defined, the output is defined, the success criteria are clear, and the remaining question is execution, not discovery.

That does not mean the solution is obvious. It means the problem is well-formed enough that the work can be scoped, priced, and delivered without expanding indefinitely. If you’re not sure what you need, the architecture review is the right first step, because it turns an unclear situation into a clear scope. If you know exactly what you need, a fixed-scope build gets you a working system without paying for iteration on top of iteration.

How to get a scope estimate

Send me a description of what you’re building: what the input is, what the output should be, what the AI needs to do, what systems it has to integrate with, and what success looks like. You’ll get back a scope outline with deliverables, timeline, and price — or a note explaining what has to be clarified first before any of it can be priced honestly.

The email is contact@bogdan-ivanov.ro.