Last week, we signalled our progressive pivot to AI. This doesn’t diminish our commitment to existing services, and we’ll continue to invest in them, unlike most; but AI, for voice and messaging, is going to become more central to our offer – I believe it is our third chapter. In that post, we expressed how open the market was, and despite, in one case, an $11bn valuation, existing technical solutions are at best mixed, at extremes woeful. We’re committed to fixing that, bringing more on-net and optimising our stack for both performance and price – to give you room to add value and still compete well in the marketplace. Literally nobody else is doing that – they want your customers all for themselves and are raising telephone number sums to get them.
When we talk about performance optimisation, we mean a few things. The first is quality – the accuracy of any transcription, the quality and naturalness of any voice – but another is latency. Nobody wants to wait 4 seconds for a robot to speak gibberish, and we know from telephony that latencies above about 160ms are perceptible to the human ear and feel a bit odd. If you’ve ever spoken over a satellite phone or terrestrially over geographical extremes, you’ll have felt this. That is currently a tough ask in AI voice, but small tweaks add up and get us closer.
I’m therefore delighted to say that the first enhancement to our v2 agents is a range of new voices. These not only sound awesome but have dramatically slashed latency. A possibly new acronym for you is TTFB (Time to First Byte), and we’ve reduced that by 300ms over the ElevenLabs voices we previously used. Combined with better network placement, our voices now have a TTFB of 150ms vs. 5–600ms before. There’s more to do – that’s the opportunity – but as the first of many enhancements in the pipeline, this is a great start.
Any new agents you create will have all these voices available, with the old ones deprecated. Existing v2 agents will have the opportunity to change to new voices as a one-way switch, and this will be a change we’ll mandate at some future point during the beta. Please check them out and let us know what you think.
Lastly, optimisation also relates to cost. Pricing AI properly is proving incredibly complicated. We want to make it simple, as existing models are frankly stupid; your beta testing is giving us the data to do so. Judicious choice of the best-of-breed elements, in the best place, on the best network, all help minimise the input components. So I’m delighted to say these new voices alone should shave $0.11/min off the old ones, which was the dominant part of the cost equation. That’s huge, although perhaps it makes us worth $11bn less in the real world!
