We hear lots of hype today about millions of IOPS from someone’s latest flash offering. It’s true that these units are very fast, but the devil is in the detail and often using the products yields a much weaker performance than the marketing would lead you to expect. That’s because most vendors measure their performance using highly tweaked benchmark software. With this type of code, the devil is in the details.
A bit extreme, perhaps, but all benchmarks can be tuned for optimal performance, while we never hear about the other, slower, results.
What eats up all of that performance? In the real world, events are not as smoothly sequenced as they are in a benchmark. Data requests are not evenly spread over all the storage drives, nor are they evenly spread in time. In fact, I/O goes where the apps direct, which means some files get much more access, making the drives they are on work hard but leaving other drives nearly idling.
Recognized as a major issue, bottlenecks of this type can bring real-world performance down by large factors. Attempts were made with hard-drive RAID arrays to spread the data over as many drives as possible, but the approach was typically limited by the small number of drives in a typical virtual drive (LUN - Logical Unit Number) and, more importantly, by the fact that spreading out the load created many more I/O operations, each of which used up a piece of the available performance. The net result was just a limited recovery of theoretical performance.
Fast forward to using flash/SSD instead of spinning rust. The new drives have little access latency beyond network traffic and address compute time. Much of the penalties for spreading data out go away. There is still a cost for handling 10 I/Os instead of 1, incurred in processing time in the drive and host. Newer data integrity approaches such as erasure coding mange spreading out data more efficiently, too.
Overall, then, we can avoid the drive bottleneck much more easily with SSD or flash, though only if data is spread over many drives.
The spiky nature of I/O is a different problem. A file may be quiescent one minute and hammered the next. It’s the nature of storage that usage brings all the elements of an app together in the same LUN or bucket and traditionally, the LUN was restricted to just a fixed set of drives. This type of access pattern can leave much of the I/O pool idling and, with the remainder limited by drive or interface maximums, this limits performance of the appliance cluster or flash unit. The reduction in effective throughput can be large. If apps are hitting a 6-drive LUN in a 60-drive array, the maximum performance is just 10%.
Again, one solution is spreading data out, but, if the access pattern is extending run times significantly, creating replicas of the files or using large erasure code settings can help. Analyzing the bottlenecks and spreading the files over more drive sets is a better solution.
All of this presupposes that the storage pool is working properly. Even a single slow or failed drive can throw a wrench in the works. Let’s take an erasure coded object. It’s spread over typically 10+6 drives. In reality, this is 16 drives with an effective capacity of 10, but the protection scheme allows 6 drives, or the 6 appliances that host them, to fail before data is lost.
When data is read, all 16 drives are accessed. If one is running slow, it doesn’t matter how fast the others are. The slow drive determines the access latency, but it won’t be flagged by today’s storage software and so the app chokes.
A failure of a drive has even more effect. Though up to 6 drives in the example can fail, when just 1 goes down, the erasure group has to be rebuilt from the remaining 15 and this slows I/O. The system will recover automatically by setting up a replacement drive and copying data, but this is slow.
Companies such as Enmotus have realized that significant analytics and automation can make much of the pain of bottlenecks and slow or failed drives go away. They are taking a leading role in setting a direction and standards of operation. Enmotus is building a framework for future software development, applicable to virtualized clusters and private and public clouds.
The first step is real-time monitoring of performance and events, but the crucial step is to apply a variety of analytic approaches to the problem.
Consider this to be a Big Data issue, with a relatively large dataflow of metrics. There are both structured and unstructured queries to be performed on this datastream and SQL and Big Data tools make sense as part of a monitoring toolset. Incidentally, taking the view of this as a datastream, much like IoT, allows for the microservices to be open-ended and extensible and will encourage ISVs to enter the market.
The next part of the toolset is an interface to storage orchestration. This interface allows a much more rapid response to the clusters issues, avoiding bottlenecks quickly by adding resources or even moving datasets around the pool of storage.
The automation feature also allows policy-driven operations in storage, such as elevating a dataset to a higher, faster storage tier in anticipation of an app using the data. This can be driven by timestamps or, more elegantly, by AI approaches to analyzing storage usage.
The beauty of this analytics approach is that it facilitates and encourages third-party software developers to create a rich ecosystem of tools. This will improve the value of the storage pool and move performance much closer to the theoretical values seen in benchmarks.
Today’s storage appliances and software are heading in the direction of bottleneck detection and remediation.