By Simon Woodhead
There’s a lot that goes on in Simwood and we move fast. Quite often, what’d be earth shattering global PR justifying an entire propaganda department elsewhere is just business as usual for us. That’s how we like it but it means quite often we don’t sing our own praises and, as we have a policy of not buying awards, things could go unnoticed. Our customers do notice in uptime and performance of course, but more often than not have no clue as to the detail under the hood because we keep it quiet.
There’s one such little data point I wanted to share today. It is nothing newsworthy (for us) and entirely anti-climactic if you’re not a network geek, but leads me somewhere else relevant!
Our network is large and distributed, across multiple availability zones running on ultra-low-latency high-density hardware, as we’ve talked about before. We also favour fibre where sensible, rather than relying on other networks to carry our packets between locations or using low-capacity last-century wired connections in them.
Outwardly we have a very permissive (but not open) peering policy towards other networks which means 90% of traffic coming on-to and leaving the network is a single network hop from the 600-odd other networks we face. That is a crazy high ratio but there are subtleties that’d be fairly easy to emulate at a surface level. Let’s face it, if someone lies about the existence of their “network”, they’ll sure as hell lie about its composition.
We’re members of numerous peering exchanges and have at least one peering exchange port in every Availability Zone, enabling traffic to be handed off directly and locally. We use the exchange’s route-servers as standard which, whilst some would intellectually masturbate about the theoretical but inactual stupidity of doing so, means anyone new joining those exchanges and using the route-servers gets local Simwood routes, and we get theirs – this is where our policy is permissive as that is zero work both sides.
Where someone wants a dedicated peering session, [which makes no difference at all to the actual traffic flows, it just means we configure a dedicated configured BGP session over the exchange rather than using the one with the route-server, which gives finer control and visibility] then we expect a minimum traffic level to justify the added overhead – this is where I say our policy is permissive but not open.
Counterintuitively, those peering exchange ports make for the most expensive traffic landing on the network and in fact just buying IP Transit off another ISP would be way way cheaper. We only do that for the 10% of our traffic we can’t establish peering with however and thus this exercise raises cost in order to improve quality – a worthy trade in our view.
Of course, some sessions on those exchanges are going to represent the majority of the traffic. For most ISPs this is the FANG networks – Facebook, Amazon, Netflix, Google although some may consider Apple to be the A – but our mix is a little different given we have mostly voice traffic and few consumer eye-balls. Regardless, when a session reaches a certain size it makes sense to peel traffic off, away from the exchange port, to a dedicated cross-connect between networks, or often several.
This is business as usual for us and means we can run our transit ports virtually empty, and our peering exchange ports very lightly loaded giving massive headroom on the network for events such as the current scourge of DDoS attacks against VoIP providers. Generally, where we’re adding direct cross connects these are individually bigger than our individual peering exchange or transit ports, giving a massive fat pipe or pipes to the peer network and of course their customers, that is immune from events that could affect the transit or peering exchange ports.
We’re therefore continually trading up the quality of the port that traffic from a given peer flows over – peering exchange is better than transit, direct fibre is better than peering exchange, multiple direct fibres are better than one and so on. These are also cumulative in that just because we have a direct fibre in from a peer ISP, it doesn’t preclude traffic flowing over the internet exchange or event transit, subject to how we and they configure priorities. The networks are just better meshed, with more routing options, and generally set to favour the highest quality route available.
This is highly relevant at the moment as we expounded in the DDoS blogs. If you have one port to one transit provider, you have one point of control and failure. If the port is full, you’re sunk. Abstract peering and it helps spread that risk but it is possible a peering port could be full too. Abstract the peering ports to direct ports for customers and peers and it is a lot less likely they will be collateral damage. Amongst the many preparations we have for a coming DDoS is shutting the network down from the outside in as ports become congested. Given the abundant capacity we have at every level this is far less likely than many networks, but the greater abstraction gives greater control which leads to a greater level of availability whatever happens.
A little known fact is that a surprising proportion of our traffic comes to and from Amazon, not shopping, but customers hosted on AWS. The AWS network is vast and has a peering policy not dissimilar to our own. We’ve long had dedicated peering sessions with AWS across every peering exchange we’re mutually on and they see a fair bit of traffic. The change is that today we’re delighted to bring up dual dedicated fibres between Simwood and AWS.
I said this was anticlimactic and business and usual because we do this very often with very many networks, but this is slightly different and I think it is a bit of a win. AWS are very nuanced about different levels of connectivity because their business relies on it – the dual fibres we’ve now put between us would be charged for thousands per month to an enterprise customer, with not a lot of change, and this is something I wanted to explore.
Some of our customers have wanted what they describe as “AWS peering” which in our book we’ve always had. What they mean is they want the equivalent of AWS’ Direct Connect product, without paying for it! Some have paid for it and it amounts to fibres coming into our racks from AWS and a gateway device to translate their RFC1918 (aka internal) IP addresses to something routable on our network, and we’re happy to accommodate that for those it makes sense for. Heck, we’ve even had customers want to avoid the gateway device and take an allocation of Simwood IP addresses into their AWS set-up – they had the clout with both us and AWS for that to make sense!
For those who don’t want to spend thousands a month to reach a voice carrier we’re now in functionally the same position in that there’s redundant dedicated fibres, with significantly higher throughput than an individual Direct Connect would achieve, with no need for an on-net gateway appliance and no monthly charges. The sole compromise for those arguable wins is that traffic on the AWS side enters their VPC from the public side but that is a logical difference rather than physical I believe.
So if you’re an AWS customer, you’ll continue to see traffic flowing directly between Simwood and AWS, with no intermediary, giving maximum quality through minimised latency, lowest jitter, and bags and bags of capacity. It is just that all of those things improved massively!
Oh, BTW, we peer with every other major cloud provider too, as well as every sentient access network (the main exception needing a blog post of its own to explain the double-think!).