Enmotus Blog

Evolution of Storage Software

Posted by Adam Zagorski on Apr 4, 2018 12:37:47 PM

Part 1 … the server and cluster


Since time immemorial, we have used the SCSI-based file stack to define how we talk to drives. Mature, but very verbose, it was an ideal match to single-core CPUs and slow interfaces to very slow hard drives. With this stack, it was perfectly acceptable to initiate an I/O and then swap processes, since the I/O took many milliseconds to complete.

The arrival of flash drives upset this applecart completely. IOPS per drive grew by 1000X in short order and neither SCSI-based SAS nor SATA could keep up. The problem continues to get worse, with the most recent flash card leader, Smart IOPS, delivering 1.7 million IOPS, a 10-fold further increase.

The industry’s answer to this performance issue is replacing SAS and SATA with PCIe and the protocol with NVMe. This gives us a solution where multiple ring-buffers contain queues of storage operations, with these queues being contexted to cores or even apps. This allows a bunch of operations to be pulled from the queue and processed by the drive using RDMA techniques. On the return side, response queues are likewise built up and serviced by the appropriate host.  Interrupts are concatenated so that one interrupt services many responses.

Read More

Topics: NVMe, big data, Intel Optane, software defined storage

Information Storage – A truly novel concept

Posted by Jim O'Reilly on Oct 17, 2017 9:37:29 AM

When you see “storage” mentioned it’s often “data storage”. The implication is that there is nothing in the “data” that is informational, which even at a verbatim read is clearly no longer true. Open the storage up, of course, and the content is a vast source of information, both mined and unmined, but our worldview of storage has been to treat objects as essentially dumb, inanimate things.

This 1970’s view of storage’s mission is beginning to change. The dumb storage appliance is turning into smart software-defined storage services running in virtual clusters or clouds, with direct access to storage drives. As this evolution to SDS has picked up momentum, pioneers in the industry are taking a step beyond and looking at ways to extract useful information from what is stored and convert it to new ways to manage the information lifecycle, protect integrity and security and provide guidance that is information-centric to assist processing and guide the other activities around the object.

Read More

Topics: big data, SSD, Data Center

Using advanced analytics to admin a storage pool

Posted by Adam Zagorski on Sep 25, 2017 1:43:50 PM

Manual administration of a virtualized storage pool is impossible. The pace of change and the complexity of the information returned from any metrication is too complex for a human to understand and respond in anything close to an acceptable timeframe.

Storage analytics sort through the metrics from the storage pool and distil useful information from a tremendous amount of near-real-time data. The aim of the analytics is to present information about a resolvable issue in a form that is easy to understand, uncluttered by extraneous data on non-important events.

Let’s take detecting a failed drive as an example. In the early days of storage, understanding a drive failure involved a whole series of CLI steps to get to the drive and read status data in chunks. This was often complicated by the drive being in a RAID array drive-set. This approach worked for the 24 drives on your server, but what happens when we have 256 drives and 10 RAID boxes, or 100 RAID boxes…get the problem?

Read More

Topics: NVMe, big data, All Flash Array, Data Center, data analytics, cloud storage

Car Wrecks and Crashing Computers

Posted by Jim O'Reilly on Sep 13, 2017 12:05:33 PM

We are just starting the self-driving car era. It’s a logical follow-on to having GPS and always-connected vehicles, but we are still in the early days of evolution. Even so, it’s a fair bet that a decade from now, most, if not all, vehicles will have self-driving capability.

What isn’t clear is what it will look like. Getting from point A to point B is easy enough (GPS), and avoiding hitting anything else seems to be in the bag, too. What isn’t figured is how to stop those awful traffic jams. I live in Los Angeles and a 3-hour commute Friday afternoon is commonplace. In fact, Angelinos typically spend between 6 and 20 hours a week in their cars, with the engine running, gas being guzzled and their tempers being frayed!

It’s particularly true in LA that each car usually has a single occupant, so that’s a lot of gas, metal and pavement space for a small payload. What this leads us to is the idea of

  1. Automating car control and centralizing routing. This would allow, via a cloud app, load-balancing the roads and routing around slowdowns
  2. Making the vehicles single or dual seater electric mini-cars
  3. Using the Mini-cars to pack more effective lanes and move cars closer together
Read More

Topics: big data, data analytics, cloud storage

How Many IOPS Do You Need For Real-World Storage Performance?

Posted by Adam Zagorski on Aug 22, 2017 11:12:17 AM

We hear lots of hype today about millions of IOPS from someone’s latest flash offering. It’s true that these units are very fast, but the devil is in the detail and often using the products yields a much weaker performance than the marketing would lead you to expect. That’s because most vendors measure their performance using highly tweaked benchmark software. With this type of code, the devil is in the details.

A bit extreme, perhaps, but all benchmarks can be tuned for optimal performance, while we never hear about the other, slower, results.

What eats up all of that performance? In the real world, events are not as smoothly sequenced as they are in a benchmark. Data requests are not evenly spread over all the storage drives, nor are they evenly spread in time. In fact, I/O goes where the apps direct, which means some files get much more access, making the drives they are on work hard but leaving other drives nearly idling.

Read More

Topics: NVMe, big data, Data Center, hyperconverged, storage analytics

Content driven tiering using storage analytics

Posted by Adam Zagorski on Aug 9, 2017 10:05:00 AM

IT has used auto-tiering for years as a way to move data from expensive fast storage to cheaper and slower secondary bulk storage. The approach was at best a crude approximation, being only able to distinguish between objects on the basis of age or lack of use. This meant, for instance, that documents and files stayed much longer in expensive storage than was warranted. There simply was no mechanism for sending such files automatically to cheap storage.

Now, to make life even more complicated, we’ve added a new tier of storage at each end of the food chain. At the fast end, we now have ultra-fast NVDIMM offering an even more expensive and, more importantly space limited, way to boost access speed, while at the other end of the spectrum the cloud is reducing the need for in-house long-term storage even more. Simple auto-tiering doesn’t do enough to optimize the spectrum of storage in a 4-state system like this. We need to get much savvier about where we keep things.

The successor to auto-tiering has to take into account traffic patterns for objects and plan their lifecycle accordingly. For example, a Word document may be stored as a fully editable file in today’s solutions, but the reality is that most of these documents, once fully edited, become read-only objects moved in their entirety to be read. If changes occur, a new, renamed, version of the document is created and the old one kept intact.

Read More

Topics: autotiering, big data, Data Center, NVMe over Fibre, enmotus, data analytics

The Evolution of Hyper-servers

Posted by Jim O'Reilly on Aug 2, 2017 2:51:09 PM

A few months ago, the CTO of one of the companies involved in system design asked me how I would handle and store the literal flood of data expected from the Square-Kilometer Array, the world’s most ambitious radio-astronomy program. The answer was in two parts. First, data needs compression, which isn’t trivial with astronomy data and this implies a new way to process at upwards of 100 Gigabyte speed. Second, the hardware platforms become very parallel in design, especially in communications.

We are talking designs where each ultra-fast NVMe drive has access to network bandwidth capable of keeping up with the stream. This implies having at least one 50GbE link per drive, since drives with 80 gigabit/second streaming speeds are entering volume production today.

Read More

Topics: NVMe, big data, Intel Optane, Data Center

How To Prevent Over-Provisioning - Dynamically Match Workloads With Storage Resources

Posted by Adam Zagorski on Jun 25, 2017 10:05:00 AM

The Greek philosopher Heraclitus said, “The only thing that is constant is change.” This adage rings true today in most modern datacenters. The demands on workloads tend to be unpredictable, which creates constant change. At any given point in time, an application can have very few demands placed on it, and at a moment notice the workload demands spike. Satisfying the fluctuations in demand is a serious challenge for datacenters. Solving this challenge will translate to significant cost savings amounting to millions of dollars for data centers.

Traditionally, data centers have thrown more hardware at this problem. Ultimately, they over provision to make sure they have enough performance to satisfy peak periods of demand. This includes scaling out with more and more servers filled with hard drives, quite often short stroking the hard drives to minimize latency. While hard drive costs are reasonable, this massive scale out increases power, cooling and management costs. The figure below shows an example of the disparity between capacity requirements and performance requirements. Achieving capacity goals with HDDs is quite easy, but given that individual high performance HDDs are only able to achieve about 200 random IOPS, it takes quite a few HDDs to meet performance goals of modern database applications.

Today, storage companies are pushing all flash arrays as the solution to this challenge. This addresses both the performance issue as well as the power and cooling, but now massive amounts of non-active (cold) data are stored on your most expensive storage media. In addition, not all applications need flash performance. Adding all flash is just another form of overprovisioning with a significantly higher cost penalty.

Read More

Topics: NVMe, autotiering, big data, All Flash Array, SSD, Data Center, NVMe over Fibre, data analytics

Storage Automation In Next Generation Data Centers

Posted by Adam Zagorski on Jan 31, 2017 1:04:37 PM

Automation of device management and performance monitoring analytics are necessary to control costs of web scale data centers, especially as most organizations continually ask for their employees to do more with fewer resources.

Big Data and massive data growth are at the forefront of datacenter growth. Imagine what it takes to manage the datacenters that provide us with this information.

 

According to research conducted by Seagate, time consuming drive management activities represent the largest storage related pain points for datacenter managers. In addition to trying to manage potential failures of all of the disk drives, managers must monitor the performance of multiple servers as well. As indicated by Seagate, there are tremendous opportunities in cost savings if the timing of retiring disk drives can be optimized. Significant savings can also result from streamlining the management process.

 

While there is no such thing as a typical datacenter, for the purpose of discussion, we will assume that a typical micro-datacenter contains about 10,000 servers while a large scale data center contains on the order of 100,000 servers. In a webscale hyperconverged environment, if each server housed 15 devices (hard drives and/or flash drives), a datacenter contains anywhere from 150,000 to 1.5 million devices. That is an enormous amount of servers and devices to manage. Even if we scaled back by an order of a magnitude or two, to 50 servers and 750 drives for example, managing a data center is a daunting task.

 

Read More

Topics: NVMe, big data, All Flash Array, hyperconverged, NVMe over Fibre

Flash Tiering: The Future of Hyper-converged Infrastructure

Posted by Adam Zagorski on Jan 12, 2017 1:04:00 PM

The Future of Hyper-converged Infrastructure

Read More

Topics: NVMe, big data, 3D Xpoint, SSD, Intel Optane, Data Center, hyperconverged

Delivering Data Faster

Accelerating cloud, enterprise and high performance computing

Enmotus FuzeDrive accelerates your hot data when you need it, stores it on cost effective media when you don't, and does it all automatically so you don't have to.

 

  • Visual performance monitoring
  • Graphical managment interface
  • Best in class performance/capacity

Subscribe to Email Updates

Recent Posts