Real-time intelligence at the carrier layer: how the media tap works

Part 5 of 12 – Conversation Intelligence Platform series Back to Part 1

The “carrier advantage” described in Part 4 is not a marketing claim. It has a specific technical basis. This post explains what that is.

When someone mentions AI-powered call analysis, most people assume it works something like this: call ends, recording gets uploaded somewhere, transcription runs, analysis follows. That’s the third-party model, and it’s not wrong exactly – it’s just slow and structurally limited. You’re always analysing the past. The fraud is already committed. The compliance deviation has already happened. The escalation moment has already been missed.

Simwood’s position as the carrier means we can do something different: tap the media in real time, while the call is in progress, at the network layer, without touching the call itself.

The media tap is a one-way fork. When a session is configured for analysis, the RTP media stream is mirrored – not redirected, mirrored. The call continues entirely normally; the parties involved are unaffected. What happens in parallel is that the mirrored stream feeds into the analytics pipeline, where transcription, analysis, and Operator evaluation run continuously as the conversation develops.

Internally, we subscribe directly to the stream using RTPEngine’s native subscribe/unsubscribe mechanism. This gives us low-overhead carrier-layer access without protocol overhead that would add no value in our own pipeline.

The one-way nature is important. The tap receives media but doesn’t send any. There is no latency added to the call path from the analysis side, and there is no possibility of the analysis infrastructure affecting call quality. The call is not routed through an AI system; a copy of the audio is sent to one.

The two-way tap – which is the basis for the whisper action described in Part 6 – adds the ability to inject synthesised audio into the call. The mechanism is the same RTPEngine infrastructure, operating bidirectionally. Audio synthesised from Operator outputs can be delivered into the call in real time, heard only by the intended recipient. The caller continues the conversation unaware.

The platform offers three service tiers, which determine how much of the integration work sits on your side.

Tier 1 – Fully Managed. Operators are configured against a trunk and every call passing over that trunk is analysed using the same configuration. When an Operator fires, a Configuration delivers the results to your webhook endpoint. Call-to-agent association is your responsibility on receipt, using the call ID and SIP-layer metadata. Best for customers who want per-trunk Operator granularity with minimal integration effort. Trunk-level configuration is currently available via the API; portal self-service for this is coming with the portal launch.

Tier 2 – API-Driven. Your platform creates a session via the Intelligence API at the start of each call, passing your own metadata and specifying which Operators to run. You receive Operator events back via webhook or event stream, with your own identifiers echoed on every event. This inverts the control model – you drive session creation rather than us inferring it from call signalling, which solves the information asymmetry problem entirely. Critically, you can also call PATCH /sessions/{id} mid-call to swap Operators on transfer, something impossible to achieve via static trunk configuration.

Tier 3 – Media Tap Only. You receive a raw SIPREC media fork to your own platform and run your own stack against it – your own transcription, your own Operators, your own storage. We are purely the carrier-grade media delivery layer. The right choice for customers who already have analytics infrastructure and simply need reliable carrier-layer media access. SIPREC is also the format available here for customers wanting to feed a direct stream into existing third-party recording or analytics solutions.

The architecture separates these concerns so cleanly that the choice between tiers is largely operational rather than architectural. The carrier-layer tap is the same mechanism throughout; what differs is where in the pipeline your involvement begins.

The practical implication of Tier 2 is worth dwelling on. Tier 1 is simple but coarse – you get analysis, but Simwood infers session context from call signalling and you inherit any ambiguity in that. Tier 2 is the model for any customer who needs precise control: you know the call has started because you created the session, your identifiers are echoed on every event, and you can change which Operators are running mid-call when something changes (a transfer, a mode switch, a new context). The information asymmetry that plagues event-driven integrations – “did that event relate to this call?” – is eliminated.

The combination of passive real-time observation and targeted active intervention, both at the carrier layer, is what makes this something a third-party vendor cannot replicate. A vendor sitting outside the call path can analyse recordings. They cannot passively tap media without redirecting the call through their infrastructure, which adds latency and creates a dependency. They cannot inject audio without doing the same.

We can, because we’re the carrier and this is our network.

Previous: Part 4 – Introducing Conversation Intelligence

Next: Part 6 – Operators

Back to series index