Back

Conversational AI

Introducing Conversation Intelligence: the carrier advantage nobody else has

Simon Woodhead

Simon Woodhead

2nd July 2026

Part 4 of 12 – Conversation Intelligence Platform series Back to Part 1

Voice calls have historically been the least visible channel in business communication. Emails are indexed. Chat is logged. CRM records are structured. But conversations – the ones where the most important things happen, where customers express real intent, where compliance obligations are met or missed, where fraud is attempted, where emergencies are declared – have produced little more than a recording that nobody has time to review

That’s a solvable problem. The question is who’s best placed to solve it, and the answer isn’t the third-party analytics vendor bolting a transcription service onto your call recordings after the fact. The answer is the carrier.

And before we get into the mechanics: this isn’t just about telephone calls. Via the Simwood Potato, any voice passing through our platform is in scope – Teams calls, WhatsApp Calling, any channel we carry. If the media passes through Simwood, the intelligence platform applies. That matters because the world has already moved well beyond the PSTN, and an intelligence layer that only understood telephone calls would be building for the past. We’ve written a lot about Conversational AI and where we think voice is headed – Conversation Intelligence is the observational and analytical complement to that active capability.

The platform is built around two distinct capabilities, each with a clearly bounded role.

Agents are active participants in conversations. They take calls, conduct dialogues, gather information, perform tasks. Simwood has been running a Conversational AI platform with production Agents for some time now – if you haven’t seen what they can do, start here. Agents are the voice of automation in a conversation: they respond, ask, listen, and act.

Operators are observers. They watch the transcript as a conversation unfolds, apply structured reasoning to what’s said, and emit typed signals. Not raw text – typed, structured results. A fraud confidence score as it climbs. An intent detected. A compliance deviation flagged. A sentiment shifting hostile. Operators do not participate in the conversation at all; the caller doesn’t know they’re there. They’re silent, continuous, and impartial.

An Operator is defined by two things: a natural language prompt describing what to look for, and a declared output type. Because the output is always structured and bounded, every downstream system can work with any Operator’s results without bespoke handling – including Operators that customers define themselves for their own specific use cases.

Signals are only useful if something happens as a result, which is where Configurations come in.

A Configuration defines which Operators run on a session and what actions to take when they fire. Two action types matter most right now.

A webhook delivers a structured event to the customer platform in real time, carrying the full context needed to act without a follow-up call. The customer platform receives the signal while the call is still in progress and can surface it on the handler’s screen, trigger an escalation, update the CRM, flag the account. We’ve been delivering real-time signals to fraud handlers in major banks in pre-AI days – this model of real-time mid-call webhook delivery is proven at scale, and Conversation Intelligence extends it with AI-driven signal generation rather than rule-based triggers.

A whisper synthesises guidance as audio and injects it into the call, heard only by the intended recipient. In most scenarios that’s the handler rather than the caller – but in an outbound calling context it might equally be the caller receiving guidance. The point is that audio injection is targeted and private. The conversation continues without interruption.

The part of this design that I find most elegant is what Agents and Operators don’t need to know about each other.

A conversation happens – between two humans, between a human and an Agent, between some combination. Operators watch it. Agents conduct it without any knowledge of what’s being observed or what might fire. Neither is coupled to the other. The signal stream is the only connection.

The implication: Operators work identically regardless of who is in the conversation. An Operator monitoring for fraud signals doesn’t care whether it’s watching a human call handler or an AI Agent. An Operator tracking protocol adherence in an emergency services call applies the same analysis whether the triage is conducted by a person or an automated system. The transcript is the transcript. Channel, medium, and participant type are irrelevant to the analysis.

This means the platform is genuinely unified rather than a collection of point solutions. The same Operator that monitors a human agent can monitor an AI Agent. The same analytical layer applies across the entire estate of conversations the platform handles.

A few scenarios to make it concrete:

Emergency services. An Agent or human handler takes an emergency call. Operators immediately begin classifying the incident type, tracking the caller’s emotional state, monitoring adherence to response protocol. When the handler misses a step, a webhook fires to the customer platform with the missed step and suggested guidance – surfaced on the handler’s screen – while a whisper simultaneously delivers spoken guidance into the handler’s ear. If the caller’s state deteriorates, a second action fires a high-priority escalation alert.

Fraud prevention. A customer receives an inbound call. Before the conversation begins, Conversation Memory surfaces prior signals associated with that number. Operators monitor in real time, scoring fraud confidence as evidence accumulates. When the score crosses a threshold, a webhook delivers the context to the customer platform while the call is still in progress.

Contact centre quality. Every conversation handled by Agents or human handlers is monitored for sentiment, intent, compliance, and resolution. When a customer turns hostile or an escalation request goes unacknowledged, an action fires to the customer platform. After the call, structured results feed dashboards and QA workflows without manual review.

Sales and retention. Operators detect buying signals, objections, and cancellation intent. Configurations surface the right knowledge base content to the handler at the right moment. Post-call data feeds the CRM automatically.

None of these require routing calls to a cloud transcription service and waiting. This is happening in the call path, at the carrier layer, in real time.

For customers whose traffic is locked with carriers who’ve invested in nothing for years: BYoC means Conversation Intelligence is available to you without number portability or changing your primary carrier. Route your media through Simwood and the intelligence platform applies.

Conversation Intelligence is currently in beta via the carrier services beta programme. If you’re not already enrolled, get in touch.

Part 5 covers the technical architecture of the carrier-layer media tap. Part 6 goes deep on Operators – how you define them, what the output types look like, how Configurations wire everything together.

Previous: Part 3 – Simwood has always shipped early

Next: Part 5 – The media tap

Back to series index

Related posts