Enmotus Blog

Using advanced analytics to admin a storage pool

Posted by Adam Zagorski on Sep 25, 2017 1:43:50 PM

Manual administration of a virtualized storage pool is impossible. The pace of change and the complexity of the information returned from any metrication is too complex for a human to understand and respond in anything close to an acceptable timeframe.

Storage analytics sort through the metrics from the storage pool and distil useful information from a tremendous amount of near-real-time data. The aim of the analytics is to present information about a resolvable issue in a form that is easy to understand, uncluttered by extraneous data on non-important events.

Let’s take detecting a failed drive as an example. In the early days of storage, understanding a drive failure involved a whole series of CLI steps to get to the drive and read status data in chunks. This was often complicated by the drive being in a RAID array drive-set. This approach worked for the 24 drives on your server, but what happens when we have 256 drives and 10 RAID boxes, or 100 RAID boxes…get the problem?datacenter computers.jpg

In fact, GUI tools were added to RAID management software just because of th

is problem, but they still involved quite a bit of drilling down into graphics screens to get to the real issue. That approach falls apart when we have highly automated and orchestrated storage pools, such as we see with virtual clusters and private clouds.

The cloud epitomizes the issue. Instances come and go very rapidly…the cloud is a swirling mass of chaos, by choice. Storage is virtualized, too and virtual volumes are being attached and detached from compute instances very frequently. Moreover, the same virtual volume can be attached to different applications, each in their own instance, and these interact with each other by competing for the IOPS available from each volume.

These interactions can create bottlenecks in the data flow to any one of the sharing instances. Spiky workloads can create collisions that slow access and as we migrate more time-critical applications to cloud environments we’ll find that meeting latency needs for say a financial services app may turn out difficult, simply because we can’t see what is happening fast enough and we can’t react to deliver a meaningful response.

This is where analytics enter the picture. The first step is metricating the storage system to gather salient data. Now “salient” is an open-ended term meaning any data that matters, so a good analytics tool should have extensible data gathering.

Metrication has to run in near-real-time and it’s going to generate a ton of raw data, most of which is totally irrelevant to making a decision at any point in time, though it might be useful in identifying trends and setting baselines. Data ranges from drive status details from the SMART system, to IOPS rates and latencies on a per instance, per server or per drive basis. Traffic between instances or nodes via VLANs is measured, as an example of how malformed connections or fabric path overloads can be spotted.

The second part of any good analytics system is an analytics engine that can be queried from the third element, the GUI interface, to offer answers to any specific query. Mostly, these queries take the form of standing “traps” such as “Identify all drives running at more than 80 percent capacity” or “Flag drives with more than 90 percent of capacity used”, extending to “Applications with high traffic levels” or “Latencies outside of SLA”.

Clearly, the list is open-ended and handling such queries requires a database query approach. We have choices at this point.  We can use a structured database, but are likely to find this too inflexible in what will be a dynamic environment with new data types appearing frequently. More likely, a big data approach works better, allowing broad questions and multi-faceted searches.

With a big data approach, the metrics engine is presenting a dataset and query system that can be accessed by a variety of tools. This allows a user to move beyond the toolset that an analytics software vendor provides and take analysis to a new level.

One way this could work is to utilize a suite of analytics tools that focus on subsets of the whole storage management issue. One might be detecting flash cell wear by vendor and drive model to help better drive buys in the future.

Another might be the application of artificial intelligence to administer the storage pools, while yet another might be an API to the orchestration engine in a cloud, improving the utilization and efficiency of operations and providing early detection of potential hardware failures to reduce service brown-outs.

When we add in the variable of Software-Defined Storage (SDS), this increases the pressure to have advanced analytics. Now storage is a virtual pool, which increases agility by a quantum jump but makes the relationship of symptoms and events harder to relate to actionable problems. Tools like Enmotus Storage Analytics are addressing this, bridging across all the storage elements in a cloud or cluster from NVDIMMs to cloud storage to gather and present the metrics in a useful form.

 Certainly, advanced analytics is a new tool for the storage world and still in its early stages, but the approach is essential for scale-out control and we can expect a rapid evolution to powerful tools in the next few years.

Free Trial

Topics: NVMe, big data, All Flash Array, Data Center, data analytics, cloud storage