Massive storage upgrades to our vSphere environments

6th August 2013

In short

If you don’t wish to read the whole story the takeaway is that we’re replacing our general purpose vSphere storage with two tiers. New Tier 1 storage which can be attached to Simwood VMs is now capable of many 10k IOPS, ultra-low latency and throughputs in the 100s of MB/s range. Our general purpose Tier 2 storage has also got faster for most use cases.

Background

We operate large vSphere clusters in Manchester and Slough sites. These host 100% of our own applications where no media is involved. So this includes all billing, APIs, load-balancing and fax services. We’ve done this for many years now and as a result also host critical infrastructure for a number of customers. Outside the voice space this includes some major retail brands transacting tens of millions of revenue per year through platforms hosted on Simwood.

In every case we have been able to dramatically increase the performance of virtualised applications because our approach is different. Some might say commodity providers see “cloud” as an opportunity to sell the same unit many times, and by definition offer less than bare-metal performance. By contrast we provision 200% of the resources sold so each virtual machine has bare-metal performance (there’s a tiny overhead for the hypervisor) and we have capacity to run VMs without contention in the event of a hardware or site outage. Thus, like our network, no host is ever loaded more than 50% and VMs should never be contended for RAM or CPU.

When it comes to disk access vSphere has previously required shared storage. This has implied the use of SANs rather than on-host disks. The commodity “cloud” providers by contrast tend to use on-host disks, limiting performance to the disks in the hosts but also exposing IO to contention – the busy neighbour syndrome. Some, such as Amazon offer access to SAN-like services such as EBS but whilst the underlying SANs are clearly huge, the capability per customer is pretty limited. Amazon for example recently doubled IO capability to 2000 IOPS yet we have VMs doing more than that on our general SAN storage. Others, use software based solutions which mimic SANs by replicating between local storage with severe limitations – such technology was intended to enable easy entry-level not enhance provider margins.

Over the several years we’ve been operating vSphere now, the SANs have got bigger, more expensive and frankly more scary. Each VM typically enjoys 32 SAS spindles of throughput on our general storage and can burst to many times the capability of Amazon EBS per volume. We’ve also recently enabled IO control to avoid the ‘noisy neighbour’ issues. We’re not happy with SANs though and are changing things further.

Why we hate SANs

SANs are big, expensive and one of few areas where you still get what you pay for. Considering decent SANs cost £millions, most providers are at the lower end. Although we’re far from the bottom end and have invested serious money in the numerous generations of SANs we have had in recent years, we’re nowhere near the top-end unlike other areas of our infrastructure.

The good thing with shared storage is that it represents a reliable, central store of which you can hang more commoditised hosts. The problem is that in our experience they are not as reliable as they should be. We’ve had some terrifying experiences with SANs, including having one model withdrawn from global sale immediately and the MD on a plane to the UK within an hour due to the discovery of critical bugs, through possibly the largest storage distributor in the UK repeatedly shipping components which are unsupported by the manufacturer. They are the only element of our entire infrastructure which maintain an air of mystique and a lack of control and we don’t like that. Whilst our customers have never lost a byte to a failed SAN we find them unnecessarily stressful!

The future of storage

The future has to be for storage to ‘just work’ and for performance to be software defined. One VM might require high availability storage, another may require massive IOPS, but they should be able to co-exist on the same underlying hardware. That hardware will comprise storage in each compute node rather than in a central store. This gives simplification and improved performance plus massive scalability with none of the scary elements that accompany a large SAN.

Whilst commodity cloud providers work with local storage as we have mentioned, there are massive challenges to doing this in a high performance environment. SANs are highly tuned pieces of kit but there are finite limits to the throughput of a spinning disk and overall throughput becomes more about how many you have and how efficiently you access them. Software defined storage therefore needs to also have minimal overhead but massive scalability.

This is the future as we see it and we’re pleased to see vmWare are working on it with an initiative likely to be called vSAN. This is very distinct from their Storage Appliance which we mention above as an entry level product, mis-used by some “cloud” providers. vSAN involves the creation of a software-defined SAN at hypervisor level with dedicated 10Gb NICs between hosts. It aims to offer a distributed SAN using all of the disks available within hosts, configured to suit the software defined requirements of an individual VM. They haven’t announced a release date yet but we foresee that our next generation of hosts will be able to leverage it at some point and we can diminish the use of SANs.

We really like SSDs

Many years ago, pre-virtualisation, we got fed up with having to buy database servers with more and more disks as the business grew. Experimentally (very experimentally at the time) we bought smaller servers with SSDs in and the problem was solved. Nowadays our VM based database servers outperform even those early SSD ones but they are both legions faster than those based on spinning disks. Since that epiphany every physical machine, including media servers, has been deployed with SSDs. As well as being ultra-fast they are also low power and extremely reliable. The only spinning disks remaining are in our SANs and one old database server still to be retired.

So we love SSDs and we require no convincing of their merits. We love them that much we even deploy SSDs to accelerate the NAS units we use for backups in each site! They’re very expensive but that is more than repaid in performance, power-saving and reduced stress!

We recently introduced some new SANs alongside our general purpose SANs and whilst much smaller they were deployed with SSDs. They have far few spinning disks per unit but each has multiple SSDs and automatically migrates “hot” blocks to SSD. The result is amazing. We find that 95-98% of IO operations are hitting the SSDs and benefitting from lower latency and higher throughput than many many spinning disks could achieve. Whilst bulk migration of cold data is slower due to the physics of fewer spinning disks, this occurrence is rare and limited since the fewer disks are only doing a fraction of the work they were before. These units are smaller, use a lot less power and by virtue of the SSDs should be more reliable.

Changes we’re making

In the light of this experience and our expectations for the future we’re making a few changes to our vSphere stack:

Our current general purpose SAS based SANs will be progressively retired. Two are less than 18 months old so if you are interested in buying them, please get in touch.

They will be replaced by our newer smaller SANs with auto-tiering onto SSDs. Being smaller, these will become more commoditised to avoid the problems of a centralised point of failure and the utilisation ratios (i.e. < 50%) we maintain for compute units will be applied. Rather than have a single general purpose tier of storage, these will form our new Tier 2 storage. Whilst we’re calling it ‘Tier 2’ the IOPS per VM far exceeds that of commodity cloud, and even Amazon’s EBS and typical local storage – we’re still talking of many SAS drives, just fewer than 32 as they’re accelerated by the SSDs.

The really high IO VMs we have tend to be customer database servers, where they haven’t yet migrated to RAM based solutions such as Redis. For them IO is critical and whilst our VMs out-perform the physical hosts they replace in every case where we have compared, we can do more. Our Tier 1 storage will therefore now come in the form of the epic Intel DC S3700 SSDs. These are being deployed locally in the newest generation of hosts offering massive massive IO for the VMs that need it.

We will also be retro-fitting existing hosts with the Intel DC S3500 SSDs which offer similar read performance but lower (but still amazing) write performance. They’re pitched by Intel for cloud-computing, web servers etc.

Local storage?

It is important to note that whilst our Tier 2 storage will remain SAN based and therefore shared, Tier 1 will be local and the failure of the host it is on will lead to an outage for those VMs. We’re still able to migrate running VMs from one host to another, just as we can with shared storage, and we still take extensive backups at the VM level but because of this dependency this tier of storage suits those VMs where the customer has multiple instances and the redundancy is offered at the application or OS level.

That said, whilst shared storage is in theory more redundant than local storage, our experience differs and we find our solid-state compute hosts have been far more reliable than SANs of moving disks, even considering we have far more of them. The additional throughput capabilities also enable much quicker backups so near-CDP (Continuous Data Protection) is possible through the rapid replication of changes to the VM disks using other technologies we make use of. We actually think this is a step forward for availability over SANs and a massive step forward for performance, before vSAN realises our storage dreams!

This trade-off, if you believe there is one, will exist only for as long as vSAN is unavailable and un-deployed. As soon as we’re confident in being able to maintain performance across a distributed software defined SAN, with increased availability then we will.

New generation of hosts

We’ve just placed orders for our new generation of hosts. Considering all of the above these are built with 24 cores of 2GHz Intel Xeon CPUs, 256GB RAM each, hot-plug Intel data-centre SSDs and capability for additional hot-plug SAS drives as vSAN becomes available. We’re also standardising on 10Gb networking with each host enjoying 4 x 10Gb NICs. We’re very excited at the increase in compute power per host (our present generation are 196GB RAM) but mostly in the potential to finally ditch SANs within the life-time of these hosts!

The IO capability of the on-board SSDs is simply mind-blowing and lends itself superbly to database type work, VDI or simply for those wanting to accelerate modest IO to stunning levels. If you haven’t deployed SSDs in a server yet you’ll be amazed how every operation seems quicker, even those you didn’t think involved disk IO; the reality is that most do, even if only for logging, and the reduced latencies of SSDs really come into their own here. We’ll be offering the two types of SSD on a first-come, first-served basis favouring customers with existing VMs but expect to have ample capacity available. If you’d like to reserve yours, please get in touch.

Footnote: Can I virtualise my voice application?

In answer to the question we always receive when we talk about vSphere the answer is ‘yes and no’. Signalling absolutely but for passing voice, it depends.

We know there are companies considering themselves competitors in wholesale voice who operate entirely on the commodity cloud. The potential issues relate to CPU contention and timing and we’d say that in the commodity cloud the degree to which this works well is a matter of luck – if you happen to be on a quiet host with idle neighbours it might be ok.

vSphere is far better at giving bare-metal performance and we only ever run it to 50% utilisation. On our vSphere therefore the potential issues should never materialise.

We do run low-level voice in our virtual stack. By low level we mean things like our internal PBX, our fax services and experimental services. All media gateways are physical but as we use hardware DSPs more and more, the remaining functions may well be virtualised in the future.

For efficient software we think you can virtualise them on us just fine. As you scale you’d want to load them more lightly than a physical machine and scale out quicker. At that stage you would likely wish to co-locate media gateways anyway and of course we fully support mixing physical and virtual machines on the same customer network.

Massive storage upgrades to our vSphere environments

Related posts

Innovation vs Stability

Service affecting STIR/SHAKEN changes (updated)

Parallel Universes