Enmotus Blog

Using advanced analytics to admin a storage pool

Posted by Adam Zagorski on Sep 25, 2017 1:43:50 PM

Manual administration of a virtualized storage pool is impossible. The pace of change and the complexity of the information returned from any metrication is too complex for a human to understand and respond in anything close to an acceptable timeframe.

Storage analytics sort through the metrics from the storage pool and distil useful information from a tremendous amount of near-real-time data. The aim of the analytics is to present information about a resolvable issue in a form that is easy to understand, uncluttered by extraneous data on non-important events.

Let’s take detecting a failed drive as an example. In the early days of storage, understanding a drive failure involved a whole series of CLI steps to get to the drive and read status data in chunks. This was often complicated by the drive being in a RAID array drive-set. This approach worked for the 24 drives on your server, but what happens when we have 256 drives and 10 RAID boxes, or 100 RAID boxes…get the problem?

Read More

Topics: NVMe, big data, All Flash Array, Data Center, data analytics, cloud storage

How Many IOPS Do You Need For Real-World Storage Performance?

Posted by Adam Zagorski on Aug 22, 2017 11:12:17 AM

We hear lots of hype today about millions of IOPS from someone’s latest flash offering. It’s true that these units are very fast, but the devil is in the detail and often using the products yields a much weaker performance than the marketing would lead you to expect. That’s because most vendors measure their performance using highly tweaked benchmark software. With this type of code, the devil is in the details.

A bit extreme, perhaps, but all benchmarks can be tuned for optimal performance, while we never hear about the other, slower, results.

What eats up all of that performance? In the real world, events are not as smoothly sequenced as they are in a benchmark. Data requests are not evenly spread over all the storage drives, nor are they evenly spread in time. In fact, I/O goes where the apps direct, which means some files get much more access, making the drives they are on work hard but leaving other drives nearly idling.

Read More

Topics: NVMe, big data, Data Center, hyperconverged, storage analytics

The Evolution of Hyper-servers

Posted by Jim O'Reilly on Aug 2, 2017 2:51:09 PM

A few months ago, the CTO of one of the companies involved in system design asked me how I would handle and store the literal flood of data expected from the Square-Kilometer Array, the world’s most ambitious radio-astronomy program. The answer was in two parts. First, data needs compression, which isn’t trivial with astronomy data and this implies a new way to process at upwards of 100 Gigabyte speed. Second, the hardware platforms become very parallel in design, especially in communications.

We are talking designs where each ultra-fast NVMe drive has access to network bandwidth capable of keeping up with the stream. This implies having at least one 50GbE link per drive, since drives with 80 gigabit/second streaming speeds are entering volume production today.

Read More

Topics: NVMe, big data, Intel Optane, Data Center

How To Prevent Over-Provisioning - Dynamically Match Workloads With Storage Resources

Posted by Adam Zagorski on Jun 25, 2017 10:05:00 AM

The Greek philosopher Heraclitus said, “The only thing that is constant is change.” This adage rings true today in most modern datacenters. The demands on workloads tend to be unpredictable, which creates constant change. At any given point in time, an application can have very few demands placed on it, and at a moment notice the workload demands spike. Satisfying the fluctuations in demand is a serious challenge for datacenters. Solving this challenge will translate to significant cost savings amounting to millions of dollars for data centers.

Traditionally, data centers have thrown more hardware at this problem. Ultimately, they over provision to make sure they have enough performance to satisfy peak periods of demand. This includes scaling out with more and more servers filled with hard drives, quite often short stroking the hard drives to minimize latency. While hard drive costs are reasonable, this massive scale out increases power, cooling and management costs. The figure below shows an example of the disparity between capacity requirements and performance requirements. Achieving capacity goals with HDDs is quite easy, but given that individual high performance HDDs are only able to achieve about 200 random IOPS, it takes quite a few HDDs to meet performance goals of modern database applications.

Today, storage companies are pushing all flash arrays as the solution to this challenge. This addresses both the performance issue as well as the power and cooling, but now massive amounts of non-active (cold) data are stored on your most expensive storage media. In addition, not all applications need flash performance. Adding all flash is just another form of overprovisioning with a significantly higher cost penalty.

Read More

Topics: NVMe, autotiering, big data, All Flash Array, SSD, Data Center, NVMe over Fibre, data analytics

Storage Automation In Next Generation Data Centers

Posted by Adam Zagorski on Jan 31, 2017 1:04:37 PM

Automation of device management and performance monitoring analytics are necessary to control costs of web scale data centers, especially as most organizations continually ask for their employees to do more with fewer resources.

Big Data and massive data growth are at the forefront of datacenter growth. Imagine what it takes to manage the datacenters that provide us with this information.

 

According to research conducted by Seagate, time consuming drive management activities represent the largest storage related pain points for datacenter managers. In addition to trying to manage potential failures of all of the disk drives, managers must monitor the performance of multiple servers as well. As indicated by Seagate, there are tremendous opportunities in cost savings if the timing of retiring disk drives can be optimized. Significant savings can also result from streamlining the management process.

 

While there is no such thing as a typical datacenter, for the purpose of discussion, we will assume that a typical micro-datacenter contains about 10,000 servers while a large scale data center contains on the order of 100,000 servers. In a webscale hyperconverged environment, if each server housed 15 devices (hard drives and/or flash drives), a datacenter contains anywhere from 150,000 to 1.5 million devices. That is an enormous amount of servers and devices to manage. Even if we scaled back by an order of a magnitude or two, to 50 servers and 750 drives for example, managing a data center is a daunting task.

 

Read More

Topics: NVMe, big data, All Flash Array, hyperconverged, NVMe over Fibre

Storage Visions 2017

Posted by Jim O'Reilly on Jan 18, 2017 2:22:42 PM

Here it is. A new year opens up in front of us. This one is going to be lively and storage is no exception. In fact, 2017 should see some real fireworks as we break away from old approaches and move on to some new technologies and software.

Read More

Topics: NVMe, SSD, Data Center, data anlytics, NVMe over Fibre

Flash Tiering: The Future of Hyper-converged Infrastructure

Posted by Adam Zagorski on Jan 12, 2017 1:04:00 PM

The Future of Hyper-converged Infrastructure

Read More

Topics: NVMe, big data, 3D Xpoint, SSD, Intel Optane, Data Center, hyperconverged

The Art of “Storage-as-a-Service”

Posted by Jim O'Reilly on Jan 9, 2017 2:24:50 PM

The Art of “Storage-as-a-Service”

Most enterprise datacenters are today considering the hybrid cloud model for their future deployments. Agile and flexible, the model is expected to yield higher efficiencies than traditional setups, while allowing a datacenter to be sized to average, as opposed to peak, workloads.

In reality, achieving portability of apps between clouds and reacting rapidly to workload increases both run up against a data placement problem. The agility idea fails when data is in the wrong cloud when a burst is needed. This is exacerbated by the new containers approach, which can start up a new instance in a few milliseconds.

Data placement is in fact the most critical issue in hybrid cloud deployment. Pre-emptively providing data in the right cloud prior to firing up the instances that use it is the only way to assure adequate those expected efficiency gains.

A number of approaches have been tried, with varying success, but none are truly easy to implement and all require heavy manual intervention. Let’s look at some of these approaches:

  1. Sharding the dataset – By identifying the hottest segment of the dataset (e.g. Names beginning with S), this approach places a snapshot of those files in the public cloud and periodically updates it. When a cloudburst is needed, locks for any files being changed are passed over to the public cloud and the in-house versions of the files are blocked from updating. The public cloud files are then updated and the locks cleared.
Read More

Topics: NVMe, autotiering, big data, SSD, hyperconverged

Hot Trends In Storage

Posted by Adam Zagorski on Dec 13, 2016 2:02:41 PM

Storage continues to be a volatile segment of IT. Hot areas trending in the news this month include NVMe over Fibre Channel, which is being hyped heavily now that the Broadcom acquisition of Brocade is a done deal. Another hot segment is the hyper-converged space, complimented by activity in software-defined storage from several vendors.

Flash is now running ahead of enterprise hard drives in the market, contributing to foundry changeovers to 3D NAND to temporarily put upward pressure on SSD pricing. High-performance storage solutions built on COTS platforms have been announced, too, which will create more pressure to reduce appliance prices.

Let’s cover these topics and more in detail:

  1. NVMe over Fibre-Channel is in full hype mode right now. This solution is a major step away from traditional FC insofar as it no longer encapsulates the SCSI block-IO protocol. Instead, it uses a now-standard direct-memory access approach to reduce overhead and speed up performance significantly.
Read More

Topics: NVMe, SSD, hyperconverged, NVMe over Fibre

Why Auto-Tiering is Critical

Posted by Jim O'Reilly on Sep 22, 2016 9:39:46 AM

 

 Storage in IT comes in multiple flavors. We have super-fast NVDIMMs, fast and slow SSDs and snail-paced hard drives. Add in the complexities of networking versus local connection and price, and capacity, and figuring the optimum configuration is no fun. Economics and performance goals guarantee that any enterprise configuration will be a hybrid of several storage types.

Enter auto-tiering. This is a deceptively simple concept. Auto-tiering moves data back and forth between the layers of storage, running in the background. This should keep the hottest data on the most accessible tier of storage, while relegating old, cold data to the most distant layer of storage.

A simplistic approach isn’t quite good enough, unfortunately. Computers think in microseconds, while job queues often have a daily or weekly cycle. Data that the computer thinks is cold may suddenly get hotter than Hades when that job hits the system. Similarly, admins know that certain files are created, stored and never seen again.

This layer of user knowledge is handled by incorporating a policy engine into auto-tiering, allowing an admin to anticipate data needs and promote data through the tiers in advance of need.

Read More

Topics: NVMe, autotiering, big data