Based on ESG validation of an Oracle model for a medium-sized Hadoop-oriented big data project, a “buy” infrastructure option like Oracle Big Data Appliance will yield approximately 21% lower costs than a “build” equivalent do-it-yourself infrastructure. And using a preconfigured appliance such as Oracle’s will greatly reduce the complexity of engaging staff from many IT disciplines to do extensive platform evaluation, testing, development, and integration.
This is the marketing crap.
The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs.
Ambari enables System Administrators to:
- Provision a Hadoop Cluster
- Manage a Hadoop Cluster
- Monitor a Hadoop Cluster
The main Oracle marketing statement is that do-it-yourself Hadoop Can Be Difficult. That’s true, but it can be mitigated.
The expertise comes in two forms:
- The Hadoop engineer, who can architect an initial Hadoop infrastructure, feed applicable data in, help the data analyst squeeze useful analytics out from the data repository, and evolve and manage the whole infrastructure over time
- The data scientist or analyst, who knows how to render the tools of statistics in the context of big data analytics, and also can lead the human and business process of discovery and collaboration in order to yield actionable results.
Even with Oracle’s solution you can’t avoid the second one, and for the first, you can outsource system architect and gradually learn and build, build and learn…
The goal of Hadoop is to use commonly available servers in a very large cluster, where each server has a set of inexpensive internal disk drives rather than proprietary appliances requiring military-level spending. So the lure of Hadoop as a general-purpose computing medium is strong, because it disrupts a pretty expensive status quo in enterprise infrastructure.
Take a moment and think of a 1000-machine cluster, where each machine has three internal disk drives; then consider the failure rate of a cluster composed of 3000 inexpensive drives + 1000 inexpensive servers!
The cool thing about Hadoop is that the reality of the mean time to failure (MTTF) rates associated with inexpensive hardware is actually well understood (a design point if you will), and part of the strength of Hadoop is that it has built-in fault tolerance and fault compensation capabilities. This is the same for HDFS, in that data is divided into blocks, and copies of these blocks are stored on other servers in the Hadoop cluster. That is, an individual file is actually stored as smaller blocks that are replicated across multiple servers in the entire cluster.
So, Oracle, recalculate TCO for your Big Data Appliance again, please.