Reading current blogs on clouds and storage it’s impossible not to conclude that most cloud users have abandoned hope on tuning system performance and are just ignoring the topic. The reality is that our cloud models struggle with performance issues. For example, a server can hold roughly 1000 virtual machines.
With an SSD giving 40K IOPS, that’s just 40 IOPS per VM. This is on the low side for many use cases, but now let’s move to Docker containers, using the next generation of server. The compute power and, more importantly, DRAM space increased to match the 4,000 containers in the system, but IOPS dropped to just 10/container.
Now this is the best that we can get with typical instances. One local instance drive and all the rest is networked I/O. The problem is that network storage is also pooled and this limits storage avail
ability to any instance. The numbers are not brilliant!
We see potential bottlenecks everywhere. Data can be halfway across a datacenter instead of localized to a rack where compute instances are accessing it. Ideally, the data is local (possible with a hyper-converged architecture) so that it avoids crossing multiple switches and routers. This may be impossible to achieve, especially if diverse datasets are being used for an app.
Networks choke and that is true of VLANs used in cloud clusters. The problem with container-based systems is that the instances and VLANs involved are often closed down by the time you get a notification. That’s the downside of agility!
Apps choke, too, and microservices likewise. The fact that these often only exist for short periods makes debug both a glorious challenge and very frustrating. Being able to understand why a given node or instance runs slower than the rest in a pack can fix a hidden bottleneck that slows completion of the whole job stream.
Hybrid clouds add a new complexity. Typically, these are heterogeneous. The cloud stack in the private segment likely is OpenStack though Azure Stack promises to be an alternative. The public cloud will be one of AWS, Azure or Google, most likely. This means two separate environments, very different from each other in operation, syntax and billing, and an interface between the two.