Build vs. Buy in AI: A Framework for Decisions Nobody Wants to Make

The build vs. buy decision in AI is the one that generates the most heated arguments on my team. Engineers want to build. Product managers want to buy. Finance wants whichever is cheaper. And everyone is right, depending on which lens you’re looking through.

I’ve made this decision dozens of times across different organisations. I’ve gotten it right about 60% of the time, which is a generous self-assessment. The 40% I got wrong taught me more than the successes, and they cost real money and real time.

Here’s what I’ve learned about how to think about this.

Why the Default Answer Is Usually Wrong

Engineers default to build. It’s not because they’re wrong about their ability to build it. They usually can. It’s because building things is intrinsically rewarding, and buying things means admitting someone else solved your problem better. There’s an ego component to this that nobody talks about openly, but it’s real.

Product and business leaders default to buy. They want the fastest path to value, and a vendor demo showing a working product is more convincing than an engineering estimate of “six to nine months.” They’re not wrong about speed. They’re often wrong about how well the vendor’s product fits their specific needs.

I’ve fallen into both traps. At Westpac, we spent eight months building an internal document processing system when an off-the-shelf solution would have covered 80% of our requirements in two weeks. The remaining 20% we needed to customise wasn’t worth the engineering time we burned.

On the flip side, I’ve watched organisations buy expensive AI platforms that required so much customisation to work with their data and processes that they effectively built a system anyway, just on someone else’s foundation with someone else’s technical debt.

The Framework I Actually Use

I evaluate build vs. buy across five dimensions. Not all are equal. The weighting depends on context.

flowchart TD
    Start[AI Capability Needed] --> Q1{Core differentiator?}
    Q1 -->|Yes| BuildLean[Lean towards Build]
    Q1 -->|No| Q2{Need it in under 3 months?}
    Q2 -->|Yes| BuyLean[Lean towards Buy]
    Q2 -->|No| Q3{Team to build AND maintain?}
    Q3 -->|Yes| Q4{High exit cost if buying?}
    Q3 -->|No| BuyLean
    Q4 -->|Yes| BuildLean
    Q4 -->|No| Hybrid[Hybrid: Buy components,\nbuild orchestration]

Is this a core differentiator? If the AI capability is central to your competitive advantage, build it. If it’s a supporting function, buy it. At Cochlear, anything that directly touches our core product and patient outcomes is something we want to own and understand deeply. But our internal productivity tools? Off the shelf every time.

This sounds simple, but companies routinely misjudge what’s a differentiator. I’ve seen teams build custom data catalogues, custom BI tools, and custom ML feature stores because they thought their requirements were unique. In almost every case, their requirements were 90% standard and 10% unique, and they spent months building the 90% that already existed as a product.

Do you have the team to build and maintain it? Building is a one-time cost. Maintaining is forever. A system you build needs engineers to keep it running, update dependencies, fix bugs, add features, and handle scale. If your AI team has six people and they’re all working on core product features, building an internal tool means pulling someone off customer-facing work.

I’ve seen teams build impressive internal tools and then watch them decay because nobody had time to maintain them. Two years later the tool is running on deprecated libraries, the person who built it has left, and migrating away is painful.

What’s the true total cost? Buy decisions look expensive upfront because you see the vendor price tag. Build decisions look cheap because people undercount the cost. You need to include engineering salaries for the build period, ongoing maintenance cost, opportunity cost of what those engineers could have built instead, and the cost of getting it wrong and having to rebuild or buy later anyway.

I keep a simple spreadsheet for this. Three-year total cost of ownership. Build column includes salaries, infrastructure, maintenance, and opportunity cost. Buy column includes license fees, implementation, customisation, and integration. The build number is almost always higher than the initial estimate. I apply a 1.5x multiplier to build estimates based on historical accuracy.

How fast do you need it? If the business needs a capability in production within three months, buying is usually the answer. Most build efforts for non-trivial AI systems take six to twelve months to reach production quality. If you have the luxury of time, building gives you more control and customisation. But be honest about whether you actually have that time.

What’s your exit cost? Every buy decision creates vendor dependency. How painful is it to switch later? If the vendor’s product holds your data in a proprietary format, if your processes become deeply integrated with their workflow, or if switching costs would consume a quarter of engineering effort, that’s a real risk.

We evaluate this explicitly now. Before signing any vendor contract, I ask: “If we needed to move off this platform in 18 months, what would that take?” If the answer is “months of engineering effort,” we either negotiate data portability terms or we seriously reconsider.

The Hybrid Approach Nobody Talks About

The most effective pattern I’ve found isn’t pure build or pure buy. It’s using bought components as building blocks.

We buy infrastructure services, pre-trained models, and commodity capabilities. We build the orchestration, the business logic, the domain-specific fine-tuning, and the integration layer. This gives us speed from the bought components and differentiation from the custom layer.

For example, we use a commercial vector database rather than building our own. We use a commercial LLM API rather than training a foundation model. But the retrieval logic, the prompt engineering, the evaluation framework, and the integration with our specific data sources are all custom. The bought components saved us months. The custom components give us something no off-the-shelf product could.

Decisions I Got Wrong and What I Learned

We once built a custom model monitoring system because we thought our requirements were unique. They weren’t. After six months of development, we had something that did roughly what three commercial products already did, but worse. We eventually migrated to a commercial product and repurposed those engineering months into actual product work.

Conversely, we bought a “complete AI platform” from a large vendor that promised end-to-end model lifecycle management. Eighteen months in, we were using maybe 20% of its features, fighting its assumptions about how AI teams work, and spending more time working around its limitations than we would have spent building the specific capabilities we needed.

The lesson from both: be specific about what you need before you decide. Not “we need model monitoring” but “we need to track prediction drift on seven models, alert when accuracy drops below a threshold, and integrate with our existing PagerDuty setup.” That level of specificity makes the build vs. buy decision much clearer.

Make the Decision Reversible When Possible

The best advice I can give: when in doubt, make the decision that’s easiest to reverse. Start with a bought solution and build a custom replacement later if you outgrow it. Or build a prototype to understand your requirements, then evaluate whether a vendor product meets them.

The worst outcome isn’t picking wrong. It’s picking wrong and being stuck with it for years because the switching cost is too high. Design for the ability to change your mind.

Build vs. Buy in AI: A Framework for Decisions Nobody Wants to Make

Why the Default Answer Is Usually Wrong

The Framework I Actually Use

The Hybrid Approach Nobody Talks About

Decisions I Got Wrong and What I Learned

Make the Decision Reversible When Possible

Related

The AI Coding Gap: Why We're Not Shipping Software Faster

Mixture of Experts: The Architecture Shift That Changed Everything

Why Your Data Strategy Is Failing at Execution