Enmotus Blog

A.I. For Storage

Posted by Jim O'Reilly on Dec 18, 2017 2:12:46 PM

As we saw in the previous part of this two-part series, “Storage for A.I.”, the performance demands of A.I. will combine with technical advances in non-volatile memory to dramatically increase performance and scale within the storage pool and also move addressing of data to a much finer granularity, the byte level rather than 4KB block. This all creates a manageability challenge that must be resolved if we are to attain the potential of A.I. systems (and next-gen computing in general).

Simply put, storage is getting complex and will become ever more so as we expand the size and use of Big Data. Rapid and agile monetization of data will be the mantra of the next decade. Consequentially, the IT industry is starting to look for ways to migrate from today’s essentially manual storage management paradigms to emulate and exceed the automation of control demonstrated in 

public clouds.

Read More

Topics: NVMe, Data Center, NVMe over Fibre, enmotus, data analytics, NVDIMM, artificial intelligence

Information Storage – A truly novel concept

Posted by Jim O'Reilly on Oct 17, 2017 9:37:29 AM

When you see “storage” mentioned it’s often “data storage”. The implication is that there is nothing in the “data” that is informational, which even at a verbatim read is clearly no longer true. Open the storage up, of course, and the content is a vast source of information, both mined and unmined, but our worldview of storage has been to treat objects as essentially dumb, inanimate things.

This 1970’s view of storage’s mission is beginning to change. The dumb storage appliance is turning into smart software-defined storage services running in virtual clusters or clouds, with direct access to storage drives. As this evolution to SDS has picked up momentum, pioneers in the industry are taking a step beyond and looking at ways to extract useful information from what is stored and convert it to new ways to manage the information lifecycle, protect integrity and security and provide guidance that is information-centric to assist processing and guide the other activities around the object.

Read More

Topics: big data, SSD, Data Center

Using advanced analytics to admin a storage pool

Posted by Adam Zagorski on Sep 25, 2017 1:43:50 PM

Manual administration of a virtualized storage pool is impossible. The pace of change and the complexity of the information returned from any metrication is too complex for a human to understand and respond in anything close to an acceptable timeframe.

Storage analytics sort through the metrics from the storage pool and distil useful information from a tremendous amount of near-real-time data. The aim of the analytics is to present information about a resolvable issue in a form that is easy to understand, uncluttered by extraneous data on non-important events.

Let’s take detecting a failed drive as an example. In the early days of storage, understanding a drive failure involved a whole series of CLI steps to get to the drive and read status data in chunks. This was often complicated by the drive being in a RAID array drive-set. This approach worked for the 24 drives on your server, but what happens when we have 256 drives and 10 RAID boxes, or 100 RAID boxes…get the problem?

Read More

Topics: NVMe, big data, All Flash Array, Data Center, data analytics, cloud storage

Optimizing Dataflow in Next-Gen Clusters

Posted by Jim O'Reilly on Sep 6, 2017 10:57:55 AM

We are on the edge of some dramatic changes in computing infrastructure. New packaging methods, ultra-dense SSDs and high core counts will change what a cluster looks like. Can you imagine a 1U box having 60 cores and a raw SSD capacity of 1 petabyte? What about drives using 25GbE interfaces (with RDMA and NVMe over Fabrics), accessed by any server in the cluster?

Consider Intel’s new “ruler” drive, the P4500 (shown below with a concept server). It’s easy to see 32 to 40 TB of capacity per drive, which means that the 32 drives in their

concept storage appliance give a petabyte of raw capacity (and over 5PB compressed). It’s a relatively easy step to see those two controllers replaced by ARM-based data movers which reduce system overhead dramatically and boost performance nearer to available drive performance, but the likely next step is to replace the ARM units with merchant class GbE switches and talk directly to the drives.

I can imagine a few of these units at the top of each rack with a bunch of 25/50 GbE links to physically compact, but powerful, servers (2 or 4 per rack U) which use NVDIMM as close-in persistent memory.

The clear benefit is that admins can react to the changing needs of the cluster for performance and bulk storage independently of the compute horsepower deployed. This is very important as storage moves from low-capacity structured to huge capacity big-data unstructured.

Read More

Topics: All Flash Array, Intel Optane, Data Center, NVMe over Fibre, data analytics

How Many IOPS Do You Need For Real-World Storage Performance?

Posted by Adam Zagorski on Aug 22, 2017 11:12:17 AM

We hear lots of hype today about millions of IOPS from someone’s latest flash offering. It’s true that these units are very fast, but the devil is in the detail and often using the products yields a much weaker performance than the marketing would lead you to expect. That’s because most vendors measure their performance using highly tweaked benchmark software. With this type of code, the devil is in the details.

A bit extreme, perhaps, but all benchmarks can be tuned for optimal performance, while we never hear about the other, slower, results.

What eats up all of that performance? In the real world, events are not as smoothly sequenced as they are in a benchmark. Data requests are not evenly spread over all the storage drives, nor are they evenly spread in time. In fact, I/O goes where the apps direct, which means some files get much more access, making the drives they are on work hard but leaving other drives nearly idling.

Read More

Topics: NVMe, big data, Data Center, hyperconverged, storage analytics

Storage Analytics and SDS

Posted by Jim O'Reilly on Aug 17, 2017 11:30:36 AM

Software-defined storage (SDS) is a part of the drive to make infrastructure virtual by providing an abstraction of the control logic software (the control plane) from the low-level data management (data plane). In the process, the control plane becomes a virtual instance that can reside in any instance in the computer cluster.

The SDS approach allows the control micro-services to be scaled for increased demand, to be chained for more complex operations (Index+compress+encrypt, for example), while making the systems generally hardware agnostic. No longer is it necessary to buy storage units with a given set of functions only to face a forklift upgrade if new features are needed.

SDS systems are very dynamic, with mashups of micro-services that may survive only for a few blocks of data. This brings new challenges:

  • Data flow - Network VLAN paths are transient, with rerouting continuously happening for new operations, for failure recovery and for load balancing
  • Failure detection - Hard failures are readily detectable, allowing a replacement instance and recovery to occur quickly. Soft failures are the problem. Intermittent errors need to be trapped, analyzed and mitigation exercised
  • Bottlenecks - Slowdowns occur in many different places. Code is not perfect, nor is it 100 percent tested bug-free. In complex storage systems, we’ll see path or device slowdowns, on the storage side, and instance or app issues, on the server side. Moreover, problems may reside in the network caused by collisions both at the endpoints of a VLAN and in the intermediate routing nodes.
  • Everything is virtual - The abstraction of the planes complicates root cause analysis tremendously
  • Automation - There is little human intervention in the operation of SDS. Reconnecting and analyzing manually is naturally very difficult, especially in real-time
Read More

Topics: Data Center, software defined storage, storage analytics, SDS

Content driven tiering using storage analytics

Posted by Adam Zagorski on Aug 9, 2017 10:05:00 AM

IT has used auto-tiering for years as a way to move data from expensive fast storage to cheaper and slower secondary bulk storage. The approach was at best a crude approximation, being only able to distinguish between objects on the basis of age or lack of use. This meant, for instance, that documents and files stayed much longer in expensive storage than was warranted. There simply was no mechanism for sending such files automatically to cheap storage.

Now, to make life even more complicated, we’ve added a new tier of storage at each end of the food chain. At the fast end, we now have ultra-fast NVDIMM offering an even more expensive and, more importantly space limited, way to boost access speed, while at the other end of the spectrum the cloud is reducing the need for in-house long-term storage even more. Simple auto-tiering doesn’t do enough to optimize the spectrum of storage in a 4-state system like this. We need to get much savvier about where we keep things.

The successor to auto-tiering has to take into account traffic patterns for objects and plan their lifecycle accordingly. For example, a Word document may be stored as a fully editable file in today’s solutions, but the reality is that most of these documents, once fully edited, become read-only objects moved in their entirety to be read. If changes occur, a new, renamed, version of the document is created and the old one kept intact.

Read More

Topics: autotiering, big data, Data Center, NVMe over Fibre, enmotus, data analytics

The Evolution of Hyper-servers

Posted by Jim O'Reilly on Aug 2, 2017 2:51:09 PM

A few months ago, the CTO of one of the companies involved in system design asked me how I would handle and store the literal flood of data expected from the Square-Kilometer Array, the world’s most ambitious radio-astronomy program. The answer was in two parts. First, data needs compression, which isn’t trivial with astronomy data and this implies a new way to process at upwards of 100 Gigabyte speed. Second, the hardware platforms become very parallel in design, especially in communications.

We are talking designs where each ultra-fast NVMe drive has access to network bandwidth capable of keeping up with the stream. This implies having at least one 50GbE link per drive, since drives with 80 gigabit/second streaming speeds are entering volume production today.

Read More

Topics: NVMe, big data, Intel Optane, Data Center

Automating Storage Performance in Hybrid and Private Clouds

Posted by Jim O'Reilly on Jun 27, 2017 10:10:00 AM

Reading current blogs on clouds and storage it’s impossible not to conclude that most cloud users have abandoned hope on tuning system performance and are just ignoring the topic. The reality is that our cloud models struggle with performance issues. For example, a server can hold roughly 1000 virtual machines.

With an SSD giving 40K IOPS, that’s just 40 IOPS per VM. This is on the low side for many use cases, but now let’s move to Docker containers, using the next generation of server. The compute power and, more importantly, DRAM space increased to match the 4,000 containers in the system, but IOPS dropped to just 10/container.

Now this is the best that we can get with typical instances. One local instance drive and all the rest is networked I/O. The problem is that network storage is also pooled and this limits storage avail

ability to any instance. The numbers are not brilliant!

We see potential bottlenecks everywhere. Data can be halfway across a datacenter instead of localized to a rack where compute instances are accessing it. Ideally, the data is local (possible with a hyper-converged architecture) so that it avoids crossing multiple switches and routers. This may be impossible to achieve, especially if diverse datasets are being used for an app.

Networks choke and that is true of VLANs used in cloud clusters. The problem with container-based systems is that the instances and VLANs involved are often closed down by the time you get a notification. That’s the downside of agility!

Apps choke, too, and microservices likewise. The fact that these often only exist for short periods makes debug both a glorious challenge and very frustrating. Being able to understand why a given node or instance runs slower than the rest in a pack can fix a hidden bottleneck that slows completion of the whole job stream.

Hybrid clouds add a new complexity. Typically, these are heterogeneous. The cloud stack in the private segment likely is OpenStack though Azure Stack promises to be an alternative. The public cloud will be one of AWS, Azure or Google, most likely. This means two separate environments, very different from each other in operation, syntax and billing, and an interface between the two.

Read More

Topics: Data Center, data analytics, cloud storage

How To Prevent Over-Provisioning - Dynamically Match Workloads With Storage Resources

Posted by Adam Zagorski on Jun 25, 2017 10:05:00 AM

The Greek philosopher Heraclitus said, “The only thing that is constant is change.” This adage rings true today in most modern datacenters. The demands on workloads tend to be unpredictable, which creates constant change. At any given point in time, an application can have very few demands placed on it, and at a moment notice the workload demands spike. Satisfying the fluctuations in demand is a serious challenge for datacenters. Solving this challenge will translate to significant cost savings amounting to millions of dollars for data centers.

Traditionally, data centers have thrown more hardware at this problem. Ultimately, they over provision to make sure they have enough performance to satisfy peak periods of demand. This includes scaling out with more and more servers filled with hard drives, quite often short stroking the hard drives to minimize latency. While hard drive costs are reasonable, this massive scale out increases power, cooling and management costs. The figure below shows an example of the disparity between capacity requirements and performance requirements. Achieving capacity goals with HDDs is quite easy, but given that individual high performance HDDs are only able to achieve about 200 random IOPS, it takes quite a few HDDs to meet performance goals of modern database applications.

Today, storage companies are pushing all flash arrays as the solution to this challenge. This addresses both the performance issue as well as the power and cooling, but now massive amounts of non-active (cold) data are stored on your most expensive storage media. In addition, not all applications need flash performance. Adding all flash is just another form of overprovisioning with a significantly higher cost penalty.

Read More

Topics: NVMe, autotiering, big data, All Flash Array, SSD, Data Center, NVMe over Fibre, data analytics

Delivering Data Faster

Accelerating cloud, enterprise and high performance computing

Enmotus FuzeDrive accelerates your hot data when you need it, stores it on cost effective media when you don't, and does it all automatically so you don't have to.

 

  • Visual performance monitoring
  • Graphical managment interface
  • Best in class performance/capacity

Subscribe to Email Updates

Recent Posts