5 Weird Results I Got When Trying To Build Enterprise AI Agents

Jack Rudenko
CTO of MadAppGang

Have you ever wondered why some AI agents thrive in enterprises while others struggle and fail?

I've spent months grappling with this issue, working with companies trying to implement effective AI solutions. Here's something nobody tells you: it's not about the technology. It's about psychology, politics, and perception.

Recently, I've been delving into the work of Assaf Elovic, Head of AI at Monday.com (and the brains behind GPT Researcher). His insights on enterprise deployments hit me like a truck: 'The agents that succeed aren't the smartest ones. They're the ones that understand corporate fear.'

He's right. It's changed how I think about everything.

TL;DR: 5 key lessons

Fear beats logic: Success isn't about how smart your agent is — it’s about how safe it feels.
Determinism wins: Enterprises prefer predictable scripts over clever flexibility.
Transparency sells: Control, audit logs, and "pause buttons" close deals — not just features.
Reversibility is key: Agents that generate drafts (not actions) avoid disaster and get adopted.
Ambient > chat: The future of enterprise AI isn’t conversation — it’s event-triggered, behind-the-scenes automation.

Why do enterprise AI agents fail?

Most agents don’t fail because they’re dumb. They fail because they’re unpredictable, irreversible, or scary to deploy. The real blockers are risk, fear, and trust — not logic or performance. Here’s what I learned from building and watching agents get killed.

Now I will tell you more about this.

Result #1: Enterprise AI adoption hinges on risk - not just ROI

Scales in the foreground with graphs in the background

Here's the brutal maths that actually drives enterprise adoption:

(Probability of success × Value when right) – (Probability of failure × Cost when wrong) > Cost to run.

Simple? Yes, but here's where it gets interesting.

Most builders obsess over the first part. They demonstrate their agent performing an impressive feat — perhaps writing perfect legal briefs in seconds or analysing 10,000 documents in minutes. Everyone's impressed. The POC goes well.

Then it dies in committee.

Why? Because, while you were demonstrating the 95% success rate, a middle manager was doing the sums differently. They were calculating the consequences of the 5% failure rate. What if the agent sends the wrong contract to a client? What if confidential data is exposed? What if, what if, what if...

What about the successful agents I've seen deployed at scale? They didn't just maximise value. They systematically eliminated catastrophic failure modes.

Take Harvey in the legal sector, for example. They're not just successful because they can draft contracts quickly. They're successful because, when they make a mistake, the consequences are contained. A bad first draft remains just that. It doesn't get sent to opposing counsel at 3 a.m.

The lesson? Stop optimising for peak performance. Start optimising for worst-case scenarios.

Result #2: Building predictable AI workflows beats being smart

This one really got to me as an engineer.

I built a wonderful agent that could dynamically determine workflows. Given a task, it could reason through the steps, adapt to edge cases, and handle unexpected scenarios. It was like having a brilliant intern who understood context.

But do you know what enterprises wanted? A simple script that performed the same 10 steps every time.

The thing is, predictability is valuable, but we consistently underestimate it. When a process always goes A → B → C, you can:

audit it
explain it to regulators
train people on it
build other processes around it
sleep at night knowing it won't catch you off guard.

This is why the most successful enterprise agents aren't pure agents at all. They're workflow-agent hybrids. Think of them as railways with optional detours, rather than open-world exploration.

I started using LangGraph for this very reason. It enables you to hardcode the parts that should be deterministic while retaining flexibility where it adds value.

For example, an invoice processing agent that I built is:

Deterministic: It always extracts these seven fields, always validates against these three databases, and always routes to this approval chain.
Agent-like: It handles weird formats, asks clarifying questions for ambiguous entries, and suggests categorisations.

The enterprise loved it. Not because it was clever, but because it was consistently clever.

Result #3: Trust in AI agents is built through perception, not performance

Let me tell you about one illustrative case.

Six months ago, I witnessed a brilliant agent die during a review board meeting. The technology was flawless. The ROI calculations were impressive. The pilot users loved it.

It was killed in 22 minutes.

What happened? Someone asked, "But what if it goes rogue like that car dealership chatbot that gave away free trucks?"

Game over.

Here's what I learned: In enterprise sales, you're not selling capabilities. You're selling confidence. And confidence comes from transparency and control.

The agent who succeeded in the same company two months later? Technically inferior. But it had:

a full audit log of every decision
a "pause and review" mode for sensitive operations
rollback capabilities for every action
real-time monitoring dashboards that non-technical people could understand.

The kicker? They've never used most of these features. But knowing they exist has changed everything.

Pro tip: build your observability layer first, not last. Make it customer-facing as well as developer-facing. I use LangSmith, but the tool matters less than the mindset. Show them how it's done. It's less scary than what they imagine.

Result #4: Code AI agents won because they're forgiving

Why are code agents receiving all the funding? Many people think it's because models are trained on GitHub.

Nope. It's because code has built-in forgiveness.

Think about it:

Make a mistake in the code? Revert the commit!
Bad function? Delete it!
Broken production? Roll back to the last working version.

Now compare that to:

an agent that sends emails to your entire customer base
an agent that modifies financial records
an agent that publishes content on your website.

See the difference? The cost of failure isn't just different in magnitude — it's a different kind of cost altogether. Some mistakes can't be undone.

This is why all successful non-code agents share one trait: they create reversible artefacts. They generate drafts, not final versions. They propose actions, rather than executing them. They suggest, not decide.

The pattern I see working is this:

The agent does the work.
It creates a "preview" (draft email, proposed changes, suggested response).
A human reviews and approves it.
Only then does it become "real".

It's not about trust. It's about insurance.

Result #5: Enterprise AI adoption must be rethought: Ambient agents vs. chatbots

That's where things get exciting!

Everyone's developing chatbots. But chat doesn't scale. I can only talk to one, maybe two agents at a time. My attention is the bottleneck.

The agents that will transform enterprises won't wait for a prompt. Events will trigger them:

New email arrives → Agent processes it
Document uploaded → Agent analyses it
Meeting ends → Agent creates action items
Metric crosses threshold → Agent investigates

I call these "ambient agents" - they work in the background, at scale, without constant human initiation.

However, and this is crucial, "ambient" doesn't mean "autonomous".

I built an email agent for myself. It's embarrassingly simple, but it works! It:

monitors my inbox
draft responses to common queries
schedules meetings based on email requests
extracts and tracks action items.

However, it NEVER sends an email without my approval. It never modifies my calendar without confirmation. It never makes promises that I haven't reviewed.

The magic is in the multiplication. Instead of me:

Reading 100 emails → Agent reads 100 emails, and I review 10 critical summaries.
Scheduling five meetings → Agent proposes times, I click approve.
Tracking 20 action items → Agent tracks them all, surfaces the three that need attention today.

The mental model shift: Stop thinking "AI assistant" and start thinking "AI workforce." You're not replacing yourself. You're just cloning the boring parts of yourself.

The uncomfortable truth about enterprise adoption of AI agents

Here's the knowledge I've gained from three years of developing enterprise agents:

The best agent often loses.

I've seen brilliant agents who could have saved companies millions, make way for mediocre ones who made consumers feel safe. I've seen companies choose a 10% automation solution over a 90% one because the less automated solution had an undo button.

And you know what? They're not wrong.

The thing about enterprises is that they're not startups. They can't afford to move fast and break things. They have regulatory requirements and audit trails. Most importantly, they have people whose entire careers depend on nothing going catastrophically wrong on their watch.

So, if you're developing software for enterprises, stop asking, "How can I make this more capable?

Instead, start asking:

How can I make this more reversible?
How can I make failures visible before they matter?
How can I give control to people who don't want to use it, but who need it?
How can I mitigate the worst-case scenario so that it becomes merely annoying rather than catastrophic?

What this means for enterprise AI builders

If you're developing enterprise agents, here's your checklist:

Design for catastrophe and optimise for the common case: Ensure failure modes are contained, then focus on performance.
Embrace the hybrid: Pure agents are great for demos. Workflow-agent hybrids are production systems.
Transparency isn't a feature; it's a basic requirement: If users can't see what the agent is doing, they won't allow it to perform any actions.
Draft everything: The difference between “agent generates final output” and “agent generates draft for review” is the difference between rejection and adoption.
Event-driven > conversation-driven: Build for scale from day one. Ambient agents are the future.
“Human-in-the-loop” is a feature, not a bug: The goal isn't to remove humans. It's to make humans superhuman.

My theory and final piece of advice

My theory is that the seemingly “absurd” reasons for rejecting AI agents in companies are the real reasons. These reasons act as organisational antibodies, revealing how businesses really think about risk, change, and control.

If you are having difficulty getting your agent accepted, remember that you are not just creating technology; you are building trust. You are creating trust. Trust is built gradually, one irreversible decision at a time.

Earlier, our CTO, Jack Rudenko, discussed a hidden crisis in AI that few people talk about, but which could affect all of us. Read this article to find out what caused it and how we can address it.

Also on Madappgang

ᐸ

Go Best Practices: 19 Powerful Principles Inspired by Go Proverbs

EKS vs ECS: The Brutally Honest Guide for Small Teams

Supercharging Cloud Development: How Our 50-Engineer Team Cracked the Claude Code

Best Free AI APIs for 2025: Build with LLMs Without Spending a Penny

5 Weird Results I Got When Trying To Build Enterprise AI Agents

ᐳ