The Art of “Storage-as-a-Service”
Most enterprise datacenters are today considering the hybrid cloud model for their future deployments. Agile and flexible, the model is expected to yield higher efficiencies than traditional setups, while allowing a datacenter to be sized to average, as opposed to peak, workloads.
In reality, achieving portability of apps between clouds and reacting rapidly to workload increases both run up against a data placement problem. The agility idea fails when data is in the wrong cloud when a burst is needed. This is exacerbated by the new containers approach, which can start up a new instance in a few milliseconds.
Data placement is in fact the most critical issue in hybrid cloud deployment. Pre-emptively providing data in the right cloud prior to firing up the instances that use it is the only way to assure adequate those expected efficiency gains.
A number of approaches have been tried, with varying success, but none are truly easy to implement and all require heavy manual intervention. Let’s look at some of these approaches:
- Sharding the dataset – By identifying the hottest segment of the dataset (e.g. Names beginning with S), this approach places a snapshot of those files in the public cloud and periodically updates it. When a cloudburst is needed, locks for any files being changed are passed over to the public cloud and the in-house versions of the files are blocked from updating. The public cloud files are then updated and the locks cleared.
Done well, this is a smooth process that can take just a few minutes, but it’s also complicated, especially if locking is more granular, as with a database, since in this case the locks are record level and do not impede operation at all after the burst is initiated.
- Maintaining the data pool in a telco facility midway between the clouds - Pioneered by NetApp and Verizon, this seeks to balance latencies by keeping data in the telco data barn, where there are fiber links into the public cloud.
- Splitting apps into sets where one set can run slowly if necessary while the primary set needs consistent performance – This sounds easy, but it requires a lot of upfront analysis and continuous monitoring. Obviously, the “slow” stuff gets run in the public cloud!
This approach can easily negate a good deal of the benefits of going to the cloud, with insufficient headroom in the private cloud (or having a portion of the private cloud capacity kept dormant in reserve).
This is where Storage-as-a-Service enters the picture. StaaS vendors offer a solution to the hybrid data placement problem, but in a somewhat counterintuitive way. Their approach is to place all of the data in the cloud and provide a cache in-house that is designed to offset the cloud latency. When data is written, it goes to a persistent cache and then drops down through the storage hierarchy until a copy is written to the cloud.
On read-back, recently written data is still in a local layer and rapidly retrieved, while the caching software anticipates what cloud data could be needed in the near future. This implies fairly large caches at the local level, with some data in fast SSDs, but this issue will become easier to figure as multi-terabyte NVDIMMs deploy beginning in 2017.
Making sure the right data is near the top of the stack isn’t easy. That’s the essence of the StaaS system in fact. Part of the answer is to have very efficient auto-tiering, down at the block rather than object level. Why the granularity? It’s because of limited network bandpass! A good solution will be shuffling data continuously in background between DRAM caches (augmented with NVRAM in 2017), NVMe SSDs, networked secondary bulk storage, and the cloud. To get a perspective on the value of this, visit Enmotus, the leader in micro-tiering.
A careful choice of hardware is also necessary, so that we don’t have bottlenecks between layers of storage, or overloaded networks. RDMA protocols will bring a lot of performance to this problem and are the preferred connection scheme. The good news is that these are dropping rapidly in price and are feasible with 100 and 200 GbE links.
StaaS and micro-tiering are compatible with the new hyper-converged systems approach, since the local caches can be made available to every server in a cluster. Micro-tiering and in-memory caching can really speed up operations in these systems.
StaaS vendors have some twists on the story. For instance, one company, Zadara, can provide in-premises drives and systems as well as a cloud solution. They and the other StaaS vendors all offer a pay-as-you-go approach to storage capacity, the ability to scale up or down on demand and lower cost than on-premise and even public cloud storage.
These plans move expenditures to operating expense, not capital, while, in an era of rapidly expanding capacity, the ability to right-size to the immediate need obviates the buy-ahead implications of buying gear for a 5 year planned life. Most plans include technology refreshes, so the vendor lock-in issues and the forklift upgrade costs are a thing of the past.
Where most of the vendors are still lacking is in the analytical ability to understand the very dynamic storage environment in a cloud. This area is still in its infancy and software is still missing to do a really good tracking job. That will resolve out over 2017 or 2018.
Storage as a service may fit your hybrid plans really well. If you are currently in an evaluation/pilot hybrid project, sign up for a trial and get a sense of what can be done. I think you will be pleasantly surprised.