Introducing our 100% uptime SLA!

by Simon Woodhead

November wasn’t a great month for our competitors (or their resellers who pretend to be competitors) with multiple outages amongst them.

Anyone that says they’re immune from failure is lying or simply naive. Hardware breaks, diggers pull cables out the ground, and then there’s so-called ‘Acts of God’. We’ve had more than our fair share of these over the years – I’m the guy who lost a rack full of equipment in a data centre explosion last millennium, had both primary and secondary SANs (Storage Area Network – generally the most reliable fault-tolerant component) melted in a data centre power surge this millennium, and in fact got through 7 SANs in 6 years due to all manner of crazy reasons. So we’ve been quite unlucky in many respects but an ‘outage’, a failure of our service, such that people cannot make calls in and out? We haven’t had one of those since 2010.

I remember it extremely well as a data centre floor is a very lonely place when routine maintenance results in a router giving the equivalent of “operating system not found”. It was a bug, triggered on reload by a line of config that had existed fine for months. I found that out by spending the entire night rekeying hundreds of lines of config by hand to find the point when it broke and then keying in slightly fewer until it worked. A major airport began preparations for grounding flights the next day as it transpired one of our customers was a supplier to a critical support service – added ‘motivation’ I didn’t really need. Thankfully, things were working again by 4am and that massive single point of failure was fixed pretty darn quickly. Through successive network upgrades, if that happened today, you wouldn’t notice as there is so much in the way of redundancy engineered in. I often tell this story to those who buy a router and think they’ve rid themselves of third party issues – no, you’ve added a great big single point of failure and have a long way to go (certainly way beyond the second one!) until you start improving things.

So, we’re scared of downtime. Personally, I’m so scared of downtime after so many traumatic issues like the above, mostly in the early days of this business when I was tackling them alone, that alerts invoke a physical reaction in me. Talk to me privately about other life experiences, and you’ll know I handle extreme stress pretty well, but I do not like outages and we work extremely hard to avoid them. Thankfully nowadays it isn’t just me and we have a great team responding to and preventing failures but, rest assured, I still at the very least see every single one and in the event of something big, lead the incident.

And therein lies a huge difference between Simwood and others. We’re architected so a failure (which will happen) doesn’t lead to an outage. We don’t operate out of multiple datacentres or have the cost of fibre in the ground for giggles or vanity. Our whole network effectively functions as one big SBC, with customers able to send calls anywhere and get a predictable response from a system we mostly built and we entirely operate. If it breaks, we fix it, with no service providers involved. But we try to make darn sure that if it breaks, you only know about it when we tell you it happened.

Contrast that to our peers who, to differing degrees, rely on things not breaking, that the SQL Server 2000 machine will keep working as it has for 20 years, or that the magic box won’t break. Now magic boxes are an interesting one – the default way for a legacy provider to offer VoIP. They add boxes to their network and point a bunch of customers to one, and a bunch to another, all manually configured. And then things break and there’s an outage. Run around lots, call the service provider to raise a support ticket so they can tell you what button to click, and hope for the best. Some may point customers to two magic boxes, sometimes in different places, and both manually configured. Or maybe magic boxes from multiple vendors, with different support processes. What could possibly go wrong!?

We’re very different and we hope it shows. I’m reminded how different when, like recently, we had to do a deep technical dive with a major prospect from the other side of the world and they’re blown away. Perhaps we don’t make enough of our differences and do the ‘marketing thing’ right. Indeed ‘me too’ have offered to help us with commercial acumen in the past, in between their outages. But it could just be words, and we know how others abuse words to misrepresent their place in the world or what they offer.

So instead we’re putting our money where our mouth is and launching the first (to our knowledge) 100% SLA on wholesale voice services. If we have an actual outage that means you cannot make calls out or receive calls in, we’ll pay you 25x the channel rental you paid during the period affected. So if we’re down for a day as one of our contemporaries was recently, we’re paying you for almost the whole month!

This, of course, requires you to configure your own equipment as we ask and advise – configure us like a.n.other provider with magic boxes and – like them – a failure will be an outage. You need to use the redundancy we’ve built in and we’re very happy to help with that. We’re also unable to provide muppet insurance so, as you’ll hopefully find reasonable, we’re not underwriting every other operator out there – this is in relation to issues on our network not other people’s. That’s not a get out for issues with our “service provider” or “service partner”, as remember: we do not have any. If stuff breaks, it is our fault and we will now pay you!

There is, of course, a formal version of this promise which sets out the fine points. We hope you like it and we’d love your feedback.

Your move ‘me too’, when you get off the phone to magic box support.