Faster and Better Virtualized Hadoop Performance with Micron's M51DC

By Floyd GoodrichMarch 23, 2017

The influx of data, thanks to IoT and connected devices, is accelerating at an overwhelming pace as we continue adding to it daily. Every moment, there's a staggering amount of data being produced triggered by numerous factors — consumer buying trends, traffic flow patterns, software/system logs, aerial and radio-frequency monitoring, remote sensors, and social media indicators that aid in product branding, advertising, and promotion. The amount, complexity, and diversity of this evolving data stack (also termed as ‘Big Data’) are astounding.

Several enterprises are coming up with Big Data ecosystem to facilitate the processing of this data because of its immense value in terms of relevant and almost real-time results. The challenges, however, for Big Data projects are a continuous mechanism, but at the same time, it is also an opportunity to gain deeper insights into customer data, assets, or projects.

Primarily for this reason, it is increasingly becoming imperative for enterprises to exploit Big Data analytics to improve business performance efficiencies and enhance their engagement with customers.

Big Data, when analyzed with precision and depth, enables more detailed analysis that leads to informed decision-making capabilities in real-time. Nevertheless, to achieve that proficiency, enterprises will require faster storage devices to dramatically deliver speedy results through processing and analyzing of the ever-expanding data pool.

Micron’s M510DC enterprise SSD offers the ideal combination of both speed and reliability with optimized virtualized Hadoop performances as much as 2.6X to 15X better than the legacy rotating media when deployed in the same cluster platforms and configurations.

Process Data Faster with Improved Virtualized Hadoop Performance

The days of traditional deployments are over. Enterprises looking to dig into Big Data insights will have to radically migrate away from the conventional one-instance-per-node Hadoop to better virtualized Hadoop clusters.

Architectures like virtualization can add more capabilities by increasing the number of data nodes in a Hadoop cluster.

With more available data nodes, simultaneous Hadoop-related jobs can be executed efficiently, which decreases the time to completion, leading to better cluster performance.

Faster processing and improved cluster performance provide accurate solutions for complex queries and helps with comprehensive analysis of the clustered data.

Modern enterprises are increasingly moving towards a virtualized Hadoop ecosystem, however, the full potential of it can’t be truly realized without fast, high efficient storage in the data nodes.

Experience Micron’s M510DC Enterprise SSD — The Right Storage Solution for Virtualized Hadoop

The moment a decision is taken to virtualize Hadoop, storage for the data nodes needs to be optimized. This can be achieved using legacy HDDs or enterprise SSDs like the M510DC.

Micron, to enable a direct comparison between these two storage solutions, tested two virtualized clusters - one using the new M510DC and the other using 15,000 RPM SAS HDDs.

The tests were done by applying a shell script to run multiple standard built-in Hadoop example benchmarks. Subsequently, the average run time was recorded and then the data was deleted after each run.

A three-minute pause was added after every 10 runs to ensure successful Hadoop Distributed File System (HDFS) block deletion before the next test run.

Standard Benchmark Tests Display Incredible Results in Favor of Micron’s M510DC Enterprise SSD

The Hadoop distribution comes with a set of standardized, built-in benchmarks for broad range performance measurement. The following illustration demonstrates the use of those benchmarks to determine the performance of an M510DC-based virtualized Hadoop cluster and an HDD-based virtualized Hadoop cluster.

BENCHMARK

M510DC

RESULTS

HDD

RESULTS

M510DC

ADVANTAGE

TeraGen

217.36

569.87

2.6X

TestDFSIO-Write

473.95

1281.84

2.7X

Wordcount

208.18

1829.05

 8.8X

TestDFSIO-Read

224.93

3370.1

15X

TeraSort

611.09

5139.85

 8.4X

Sort

694.31

5313.04

7.7X

 

Micron's M510DC-based platform, for each benchmark, leads the competition exhibiting substantially faster completion time.

The TestDFSIO-Read benchmark reveals the greatest performance difference with the M510DC-based platform accomplishing the benchmark 15X faster than an HDD-based virtualized Hadoop cluster.

The Bottom Line

Blazing fast SSDs like the Micron M510DC can dramatically boost the virtualized Hadoop performance by delivering enhanced speed, density, and reliability. Enterprises contemplating migrating to virtualized Hadoop should look at the all new M510DC enterprise SSD to enable the full migration potential.

*NOTE: The M510DC SSD has been announced to reach its EOL by the end of 2017. The Micron 5100 is slated as a replacement.*

For more information on Micron’s M510DC or the 5100 series of enterprise SSD storage solutions, contact a WPGA Memory Specialist.