For over three decades, we’ve lived with a boring truth. Disk drive performance was stuck in a rut, only doubling over all that time. One consequence was that storage architecture became frozen, with little real innovation. RAID added a boost, but at a high price. In fact, we didn’t get a break until SSDs arrived on the scene.
SSDs really upset the applecart. Per drive performance increased 1000X in just a few years and all bets were off at that point. Little did we realize that the potential of SSDs reached into stratospheric levels of millions of IOPS per drive.
All of this performance broke the standard SCSI model of the storage stack in the operating system. An interrupt-driven, verbose stack with up to seven levels of address translation just doesn’t cut the I/O rate needed. The answer is the NVMe stack, which consolidates I/O’s and interrupts efficiently and uses the power of RDMA to reduce round-trip counts and overhead dramatically. IOPS rates in excess of 20M IOPS have been demonstrated and there is still room to speed up the protocol.
The NVMe story is now being extended to fabric connections, which should boost HCI architectures out of the ballpark. But all this firepower in the interconnect raises the question of the other parts of the storage stack. We have a very complex file structure in many use cases and this slows I/O enormously still. REST-like systems can flatten the I/O, but offer limited file-sharing capability, while traditional filer models have too many layers.
The likely answer is to extend storage metadata significantly, with a flat-file data store. Metadata can define the connections between data owners and how operations can interact, while the flat file allows deployment of operations between the metadata images, such as erasure coding and replication. The metadata approach really is a good fit for software-defined storage approaches, given its flexibility and potential for open structure.
Clearly, there are still loose ends. Data integrity needs a rethink. Do we use lazy writes between local and HCI SSDs, for example, or do we mirror in-server, which is faster but loses the benefit of appliance-level redundancy. This gets much more complex if multiple tiers are deployed, as will be the case on most servers.
This is a good point to remind everyone that new age storage just keeps giving and giving. NVDIMM memory is just maturing in the market and a concept of even faster persistent storage needs to be added to the armentarium. The problem is that NVDIMMs are small, just about DRAM size, and that doesn’t fit a drive model well.
The answer is a cache-like model, but maintaining performance is complicated by persistence. This means a very smart cache, with capabilities for locking segments, having a wide variety of trashing algorithms and, especially in the early days of deployment, the ability to experiment and tune the results.
The operations of this smart cache will need to be very agile and highly automated … we are talking billions of IOPS here! The analytics behind the cache are perhaps more important, needing to provide insights into essentially unknown territory.
This will become even more critical with the future evolution of NVDIMMs to byte-addressability. Now we’ll face nano-granularity and new opportunities to build structures at speed. Clearly, we can expect a good few more years of evolution, yet.