The Future of Hyper-converged Infrastructure
Based on the idea that current generation flash-based storage nodes and compact servers have many common features, hyper-converged infrastructure (HCI) melds the two types of unit into a single common node type. This is achieved by making the storage sharing software a microservice housed in a virtual machine instance, so that the server engine also handles storage functionality.
One result is that the HCI nodes share storage in what is usually called a virtual SAN, though this has connotations of only supporting block I/O and not object or filer access modes. The virtual SAN runs on the Ethernet fabric that connects all the cluster nodes together, so that scale-out to large sizes can be realized.
First generation HCI suffered from slow network performance and overall did not match up to the available alternatives. All traffic between nodes involved a stately dance between storage, DRAM and the network on both ends of the transfer, together with a good deal of CPU overhead. Migrating to RDMA transfers eases some of the overhead and the recent surge in interest in Ethernet RDMA has made lower cost solutions available that easily co-exist with the cluster networking scheme.
Challenges still remain, however, especially in CPU overhead and the complexity of the path data follows from drives to the network.
The next step in the evolution of HCI will be the use of NVMe transfer methods for the drives. This approach uses RDMA to move data between the drives and the host server memory, with very low overhead. A solid technology, NVMe is increasingly the interface for performance SSDs, and even commodity M2-class units. NVMe drives can reach 10 gigabytes per second performance levels.
Another version of flash storage, the NVDIMM, brings non-volatile terabyte level storage to a DIMM, using the very fast DRAM bus for transfers. The result of NVDIMM use will be servers with an effective memory size of several terabytes. This NVDIMM storage also needs to be sharable between nodes in HCI, but the transfer path is much less complicated, with data being readable by the NIC card directly from DIMM memory.
Networking is getting much faster. Today’s sweet spot of 10 GbE is about to give way to 25GbE and even to dual-link 50 GbE and quad-link 100 GbE configurations. The new speeds improve total bandwidth, but the addition of RDMA cuts overhead and latency, while coupling in NVMe also helps.
Very soon we’ll have HCI clusters with several tiers of storage. In-memory databases, NVDIMM memory extensions and NVRamdisks, primary NVMe ultrafast SSD storage and secondary bulk storage (initially HDD but giving way beginning in 2017 to SSDs) will all be shareable across nodes.
Auto-tiering needs a good auto-tiering approach to be efficient, or else the overhead will eat up performance. Enmotus has developed a micro-tiering approach that optimizes north-south transfers in the storage tiering which will extend across all the memory classes.
Longer term futures
The next step in evolution is to unify transfer structures. There are several very similar developments on progress, such as the Gen-Z consortium, to create a next-generation system architecture. The common idea is that storage connects through a common fabric within the server. This fabric replaces the parallel memory bus with many serial links, and so ups memory bandwidth into the terabytes per second range. NV flash memory and the server SSD complement all connect to this new fabric, allowing direct access to the external network.
This external network could either be an extension of the internal fabric or it could be translated from that fabric to Ethernet or InfiniBand. Ethernet looks to be the likely winner, though, since this is the usual inter-node fabric for cloud clusters today. Mellanox and others are already architecting router cards from internal fabrics to Ethernet.
The migration to a new fabric and serially connected “DIMM” structure means that we’ll see much more compact servers. M2 form-factor SSDs are very small and overall a 75 percent shrink in appliance size looks possible, even as the node gets much more horsepower.
Next generation cloud clusters will utilize a “software-defined” approach to controlling infrastructure such as networks and storage pools. In essence, these infrastructure elements will join servers as virtual resources, with scalability by creating new instances of microservices for managing the real, underlying, hardware. Networking is already well along this path, while storage is perhaps two years behind networking and still in its early days of evolution towards an agile software-defined storage model.
HCI clusters, using containers, will be very dynamic, with instances, microservices and VLAN/VSAN connections changing rapidly. Tracking all of this and identifying bottlenecks and app or OS failure points is no small challenge and will require heavy, and smart, automation. The first step is to get adequate analytics from the storage pool. Here, Enmotus again demonstrates its expertise, with an approach to storage analytics design for the new, complex tiering and the highly dynamic operations of tomorrow’s hybrid clouds.
We are moving to a world of pre-canned policies and automated exception detection and handling. The old CLI approach to storage will soon be a memory and toolsets for the new generation environment will be worth their weight in gold!