The Greek philosopher Heraclitus said, “The only thing that is constant is change.” This adage rings true today in most modern datacenters. The demands on workloads tend to be unpredictable, which creates constant change. At any given point in time, an application can have very few demands placed on it, and at a moment notice the workload demands spike. Satisfying the fluctuations in demand is a serious challenge for datacenters. Solving this challenge will translate to significant cost savings amounting to millions of dollars for data centers.
Traditionally, data centers have thrown more hardware at this problem. Ultimately, they over provision to make sure they have enough performance to satisfy peak periods of demand. This includes scaling out with more and more servers filled with hard drives, quite often short stroking the hard drives to minimize latency. While hard drive costs are reasonable, this massive scale out increases power, cooling and management costs. The figure below shows an example of the disparity between capacity requirements and performance requirements. Achieving capacity goals with HDDs is quite easy, but given that individual high performance HDDs are only able to achieve about 200 random IOPS, it takes quite a few HDDs to meet performance goals of modern database applications.
Today, storage companies are pushing all flash arrays as the solution to this challenge. This addresses both the performance issue as well as the power and cooling, but now massive amounts of non-active (cold) data are stored on your most expensive storage media. In addition, not all applications need flash performance. Adding all flash is just another form of overprovisioning with a significantly higher cost penalty.
The ideal solution needs the ability to match workload demands with appropriate storage resources on the fly, as they are needed. Depending on the application, this could be memory class storage or high performance flash such as NVMe and cost effective SAS/SATA SSDs. The fast media would be allocated in the proper amount to match the size of the active working set, while keeping the remaining data stored cost efficiently. In essence, this is the virtualization of datacenter storage. You can draw an analogy with server virtualization, which parses our available processor resources to multiple virtual machines; datacenter storage virtualization parses out available performance storage resources to multiple physical machines.
This brings up the major challenge – how do you identify the active working set? Without this capability, and being able to do it in real time, matching fluctuating workloads with the appropriate storage resources becomes a moot point. This leaves us guessing. For example, based on history, a typical application has 5-10% of its data active. With the Internet of things and all of the new data constantly being created, it will not be safe to bet on history, as we have already stated, “The only thing that is constant is change.” The optimal solution not only identifies actual working sets in real time, but also dynamically provisions the workloads that need them.
Enmotus is developing Storage Automation and Analytics software (SAA), a new class of software that dynamically identifies the actual active data set of applications and then automatically applies the appropriate storage resources in order to optimize performance while containing costs.
The video below demonstrates a virtual storage volume adapting to a new workload, and shows how flash storage resources are allocated to the working set. This example depicts an entire volume, divided into 64 segments. The blue color is the portion of he volume mapped to cost effective capacity media and the red is the flash performance media. As the volume becomes active, the location of the activity can be seen, and the flash resources are allocated to the area of the volume with the activity.
Monitoring this activity using spatial and time analytics provides the active data usage. Once the initial baseline is determined, just the right amount of fast storage media can be added or removed from the active data volume.
The foundation of SAA is a device analytics engine that monitors and collects metrics at the device (drive) and node level. In a datacenter, it can collect device level metrics including SMART information, node performance metrics, as well as capacity usage. This information is invaluable in managing physical devices, identifying performance issues and potential failures. Time related drive management is one of a data center managers biggest pain points. At the node level, performance can be recorded in a centralized database. When thresholds are exceeded, alerts are sent to allow actions to be taken to increase performance.
Knowledge is power. Understanding change as it occurs and adapting to it dynamically has great benefits for data centers. Simplifying management saves operating expenses, while knowing exactly how much performance each application needs, and allocating the right amount of resources to those applications prevents overprovisioning, which saves capital resources.