Automation of device management and performance monitoring analytics are necessary to control costs of web scale data centers, especially as most organizations continually ask for their employees to do more with fewer resources.
Big Data and massive data growth are at the forefront of datacenter growth. Imagine what it takes to manage the datacenters that provide us with this information.
According to research conducted by Seagate, time consuming drive management activities represent the largest storage related pain points for datacenter managers. In addition to trying to manage potential failures of all of the disk drives, managers must monitor the performance of multiple servers as well. As indicated by Seagate, there are tremendous opportunities in cost savings if the timing of retiring disk drives can be optimized. Significant savings can also result from streamlining the management process.
While there is no such thing as a typical datacenter, for the purpose of discussion, we will assume that a typical micro-datacenter contains about 10,000 servers while a large scale data center contains on the order of 100,000 servers. In a webscale hyperconverged environment, if each server housed 15 devices (hard drives and/or flash drives), a datacenter contains anywhere from 150,000 to 1.5 million devices. That is an enormous amount of servers and devices to manage. Even if we scaled back by an order of a magnitude or two, to 50 servers and 750 drives for example, managing a data center is a daunting task.
There is a marketing push towards an all flash data center centering around the long term cost of ownership benefits associated with all flash, but these benefits can not yet be realized by large scale out data centers. This means for the foreseeable future, these data centers will continue to use traditional hard drives, most likely in combination with flash. Even in the distant future, when flash prices approach HDD costs and fab capacity catches up to demand, we will still see classes of flash drives, namely performance (expensive e.g. Intel’s Optane SSDs) and capacity (cost effective e.g. bulk 3D NAND) in scale out data centers. These disparate classes of drives further complicate management as they have different failure rates. Additionally, understanding how to best apply which drive to which application needs to be managed as well.
Automation tools must have the capability to monitor performance and health at the device level, the node level as well as the rack or cluster level. Starting from the lowest level, Server Node Optimization software needs to be able to collect critical device telemetry, which can be aggregated by an analytics server or appliance. Additionally, the Node Optimization software needs the ability to identify the size of the working set for each application. Knowing this, datacenter managers can virtualize the devices within the node, and blend the optimal amount of fast storage media with capacity media. Real time automated tiering within the node will assure that the working set remains on the fast storage media and adapts to any changes on the fly. The final key ingredient is the ability to monitor performance and easily make modifications if necessary.
A dedicated analytics server is becoming increasingly important to aggregate the telemetry from the devices and nodes for system wide monitoring and control at the rack or cluster level. This is primarily for the purposes of analyzing device level behavior and workloads such that the system can automatically adjust to changing usage patterns. In addition, system wide programmable alerts from the analytics server can notify datacenter managers of potential device failures as well as performance issues and allow drill down to specific devices. As new nodes come on line, the analytics server should have the ability to automatically discover and classify the new server nodes, along with new classes of storage devices as they are introduced.
Modern datacenters are complex infrastructures, which require automated device management and performance monitoring analysis. Enmotus is developing a new category of self optimizing machine learning oriented storage analytics for next generation data centers. Stay Tuned!