Anycast SIP

By Simon Woodhead

It has been a few months since we let you know about an early alpha of our anycasted SIP proxy. We’ve been busy and now have this as a late alpha – this means we will break it, there are bugs, but we’d really appreciate you playing with it and giving us feedback. We expect to put it out in beta this month when we’ve fixed some known issues!

This is quite a big deal and we’re excited.

Why?

We’ve learned over the years that customers using hostnames makes life a lot lot easier, especially when those hostnames have suitable configured SRV behind them as ours do. We can modify DNS around issues, and customer equipment knows where to failover to. But there’s two problems with this:

some customers cannot or will not follow our interop and insist on / have no choice but hardcoding a specific IP address. To be fair, this is what they’re used to when mapped to a specific magic box elsewhere so we’re the odd ones out here.
they follow our interop but have a particular user agent which only performs a DNS lookup on start-up and caches the result until the next restart, despite it passing expiry. Asterisk is particularly bad at this depending on version.

Neither of these customers get any benefit from what we’ve engineered around multi-site failover, automatic DNS updates etc. Instead, they blast traffic at sites undergoing planned maintenance or experiencing issues we’ve circumvented for other customers, unless they’ve engineered their own failover to other sites.

One solution to this is to never have any maintenance and to hope for no outages, and this seems to be the approach some of our contemporaries take. It feels lame and rather dirty though!

A better solution is one global hostname, mapped to a single IP address, that never needs to change – so it doesn’t matter if people force the use of the IP address. That address should exist in every availability zone, and traffic should hit the one that is closest. That decision should be dynamic and seamlessly handle availability zones, servers, or services coming and going. It should also be dynamic such that as network conditions change, or as nomadic users step off planes, they hit the best (closest available) node for service at that time. As a focal point for attacks, it should also be very resistant, and leverage the network to spread load across many many instances, rather than the typical single ‘active’ node. Naturally, because we vehemently disagree with the suggestion that ‘encryption is pointless‘, it should fully support encryption of signalling and not get in the way of media encryption.

That is what we’ve built, adding to the massive amount of work we’ve put into anycast in 2017 and have presented on throughout the year. As we mentioned in those talks, we started from the inside out with this.

The host any.simwood.com can be used in place of outbound proxies for normal IP authenticated or credential based trunks, or indeed as an endpoint to REGISTER to for our Registration Proxy. It supports udp/5060, tcp/5060, and tls/5061 although we continue to strongly recommend TCP or TLS, and do not support UDP at all for SIP Registration.

You’ve told me that

Indeed, we made the above case in December 2017 and I, along with a handful of customers, have been using it since. But we’ve now completely re-architected it.

The nature of anycast is that the route to a call will probably change during the life of a 3 or 4 hour phone call, so simply anycasting an edge proxy is not enough. Our previous therefore solution relied on redirection to the closest available unicast proxy, but this requires support in the calling user agent. It is catered for in RFCs but then so is DNS!

There are other techniques but they similarly rely on RFC support that we cannot be 100% certain exists universally and as we know few/no others use it, it is highly likely to not exist. Having a combination of solutions to handle what is supported, hoping for one of the options, would be bad.

We’ve therefore thrown away the rule-book and Simwood-ised it. The result is a proxy that can exist in hundreds of instances around the edge of the Simwood network and during the life of a call packets can hit any one of those instances, catering well for internet routing changes and indeed nodes failing. Calls magically still end up in the right place.

The technique we’ve used has applications behind the edge, building towards the Simwood network being one global highly-available SBC! That’s a very exciting prospect.

Remember it is an alpha, but do have a play and let us know what you think. We need to test test and and prove it at scale.