Enmotus Blog

Optimizing Dataflow in Next-Gen Clusters

Posted by Jim O'Reilly on Sep 6, 2017 10:57:55 AM
Find me on:

We are on the edge of some dramatic changes in computing infrastructure. New packaging methods, ultra-dense SSDs and high core counts will change what a cluster looks like. Can you imagine a 1U box having 60 cores and a raw SSD capacity of 1 petabyte? What about drives using 25GbE interfaces (with RDMA and NVMe over Fabrics), accessed by any server in the cluster?Optimizing data flow in datacenters

Consider Intel’s new “ruler” drive, the P4500 (shown below with a concept server). It’s easy to see 32 to 40 TB of capacity per drive, which means that the 32 drives in their

concept storage appliance give a petabyte of raw capacity (and over 5PB compressed). It’s a relatively easy step to see those two controllers replaced by ARM-based data movers which reduce system overhead dramatically and boost performance nearer to available drive performance, but the likely next step is to replace the ARM units with merchant class GbE switches and talk directly to the drives.Intel Ruler

I can imagine a few of these units at the top of each rack with a bunch of 25/50 GbE links to physically compact, but powerful, servers (2 or 4 per rack U) which use NVDIMM as close-in persistent memory.

The clear benefit is that admins can react to the changing needs of the cluster for performance and bulk storage independently of the compute horsepower deployed. This is very important as storage moves from low-capacity structured to huge capacity big-data unstructured.

It could be argued that modular, Lego-like, solutions like Gen-Z allow the drives to be distributed across the servers in an HCI structure, but generally vendors are selling locked configurations, making the independent scaling of storage and compute impossible.

dynamic flash provisioningSo what new challenges does this next-gen architecture bring? We are talking mind-blowing metrics in performance, capacity and density. With compression of data a single rack could easily hold 16 PB of erasure-coded data, together with 120 compact servers, for a core count of 7200 cores. That’s a monster! But there’s more! Add a GPU per server and the parallel processing that companies like Brytlyt and Nyriad bring make these enormously powerful.

It’s in the IOPS and such that the numbers really hit. 400 million IOPS, 200 GBPS are all realistic. These levels demand some smart management approaches. Systems at this level need automated, analytics-driven management tools to really achieve their operating stability and potential.

Let’s take some cases. These clusters are ideal for private clouds and so will support high container counts. Suppose an app replicated in a bunch of container is malformed and essentially creates a DDOS type of problem … the next-neighbor issue. Manual methods would rely on long run times and incomplete jobs as the warning, but that could be hours after the original roadblock. Then there’s the task of root-causing the issue.

An automated approach is looking for anomalies. A job slows down and the tools focus in on the local environment in near-real-time to spot the culprit. Remediation is automated too. That malformed app can either be stopped, relocated or sandboxed and the tools will verify the return of stable operations.

This worldview of future orchestration approaches applies to storage, networks and servers as well as the apps and covers hardware and software issues. But, again, there’s more! It isn’t enough to just keep things running. Cluster automation is the answer to system tuning too, so that the cluster minimizes job run time and maximizes performance metrics.

In fact, the complex system changes needed to change workload priorities between tenants can be totally automated, which is well beyond the crude “add more instances” method of today.

    Get Your Free Trial  

Topics: All Flash Array, Intel Optane, Data Center, NVMe over Fibre, data analytics