Part 1 … the server and cluster
Since time immemorial, we have used the SCSI-based file stack to define how we talk to drives. Mature, but very verbose, it was an ideal match to single-core CPUs and slow interfaces to very slow hard drives. With this stack, it was perfectly acceptable to initiate an I/O and then swap processes, since the I/O took many milliseconds to complete.
The arrival of flash drives upset this applecart completely. IOPS per drive grew by 1000X in short order and neither SCSI-based SAS nor SATA could keep up. The problem continues to get worse, with the most recent flash card leader, Smart IOPS, delivering 1.7 million IOPS, a 10-fold further increase.
The industry’s answer to this performance issue is replacing SAS and SATA with PCIe and the protocol with NVMe. This gives us a solution where multiple ring-buffers contain queues of storage operations, with these queues being contexted to cores or even apps. This allows a bunch of operations to be pulled from the queue and processed by the drive using RDMA techniques. On the return side, response queues are likewise built up and serviced by the appropriate host. Interrupts are concatenated so that one interrupt services many responses.