Software-defined storage (SDS) is a part of the drive to make infrastructure virtual by providing an abstraction of the control logic software (the control plane) from the low-level data management (data plane). In the process, the control plane becomes a virtual instance that can reside in any instance in the computer cluster.
The SDS approach allows the control micro-services to be scaled for increased demand, to be chained for more complex operations (Index+compress+encrypt, for example), while making the systems generally hardware agnostic. No longer is it necessary to buy storage units with a given set of functions only to face a forklift upgrade if new features are needed.
SDS systems are very dynamic, with mashups of micro-services that may survive only for a few blocks of data. This brings new challenges:
- Data flow - Network VLAN paths are transient, with rerouting continuously happening for new operations, for failure recovery and for load balancing
- Failure detection - Hard failures are readily detectable, allowing a replacement instance and recovery to occur quickly. Soft failures are the problem. Intermittent errors need to be trapped, analyzed and mitigation exercised
- Bottlenecks - Slowdowns occur in many different places. Code is not perfect, nor is it 100 percent tested bug-free. In complex storage systems, we’ll see path or device slowdowns, on the storage side, and instance or app issues, on the server side. Moreover, problems may reside in the network caused by collisions both at the endpoints of a VLAN and in the intermediate routing nodes.
- Everything is virtual - The abstraction of the planes complicates root cause analysis tremendously
- Automation - There is little human intervention in the operation of SDS. Reconnecting and analyzing manually is naturally very difficult, especially in real-time
A good analogy is handling our roads at rush hour. It’s easy for a human to handle a single intersection, but SDS is like handling all of Los Angeles traffic. To get even a semblance of smooth flow, there are sensors and cameras everywhere on the main arteries, the freeways, but coverage of side roads is much weaker. Moreover, there is no connection between the driver and the traffic control system.
This is where we are with SDS today. We are flying blind and that’s inevitably going to limit efficiency and cost money. We need much more metrication of the data to provide adequate coverage for all the points of pain highlighted above. A second step is to figure out analytics to do something about issues.
Metrication sounds easy, but good metrication requires as much of the reporting as possible be proactive. In other words, having to laboriously poll all the data is tedious and slow. If we think of metrication as a Big Data class of problem, we likely will conclude that a quasi-unstructured database is needed to store the output from each sensor point. “Quasi” is because each class of sensor creates data that is structured within itself.
The nature of the database is important when it comes to analytics and the resulting remedial steps. Because classes are somewhat orthogonal, micro-services can be created that specialize in a specific problem class, using the data of a subset of sensors. More-over, the micro-services are scalable and agile, existing inside the SDS constellation in virtual instances.
With care, analytics micro-services can overlap on data usage. For example, a micro-service specializing in VDI traffic would need to know about physical/virtual network issues around the VDI server instances. Not erasing any metrics for a significant time should mitigate most issues.
Clearly, it’s possible to convert this data pile into a Big Data problem and use Hadoop and other parallel tools for unstructured reduction to answer sophisticated queries such as “How many apps have latencies exceeding 3-sigma of the average?”
Looking to the future, both in city traffic and in SDS, Artificial Intelligence is the solution of choice. With detailed metrication, and a host of micro-service bots filtering and collating at the front-end, AI could be tasked with maximizing efficiency in the cluster. With cluster scale reaching to 10’s of millions of nodes and billions of instances, Google, AWS and Microsoft Azure are all on the AI bandwagon already and a result are AI instances for rent.
AI is not easy and we are just learning to crawl in SDS. The tasks of the next 5 years are adequate metrication and a first level of micro-services and GUI tools. We’ve a ways to go on defining the APIs that allow data sharing among micro-services and the standard way that storage analytics will instruct the orchestration software of the whole cluster. An example – how do you tell OpenStack Neutron to reroute a VLAN – or move data to another volume?
Pioneer work by companies such as Enmotus has already made inroads into the problem. These tools will be picked up by systems vendors as part of their SDS offering and will surface in both hyperconverged systems and more traditional clusters.
Oh, and yes! I’m looking forward to the day that Los Angeles has an AI cop that directs my self-driving car to the fastest routes automatically!