OpenStack and Hadoop ecosystems have been enjoying parallel amounts of rapid growth and adoption over the last few years. Project Sahara was established in the OpenStack community to drive the overall “Data processing on OpenStack” theme, while Hadoop-focused companies such as Cloudera and Hortonworks have been offering their vision of management frameworks and deployment models around Hadoop.
At Mirantis, we noticed that many people got quickly confused about different options for adopting Hadoop as a technology. In this talk, we’ll make an effort to address part of this confusion and set the stage for further deep-dive conversations into how OpenStack can actually help in adopting Hadoop in a particular organization. As always, we’ll keep things as vendor-neutral as possible.
Specifically, we’ll talk about the following:
- An overview and roadmap of the Hadoop ecosystem -- components like YARN, Hive, HBase, and others that constitute a working Big Data solution
- Management frameworks and deployment models offered by different vendors such as Cloudera and Hortonworks
- Typical Hadoop logical and physical architecture deployment
- Architecting OpenStack for Hadoop workloads, including:
- Picking the right hardware and sizing it properly
- Doing Storage right (HDFS, Ceph, Swift, direct block device mapping?)
- Doing Compute right (KVM or Baremetal? Making scheduling work)
- Doing Networking right (Just Neutron, or do we need full-featured SDN?)
- Leveraging extras that OpenStack has to offer (multi-tenancy, NUMA, CPU pinning)
By attending this presentation, you will gain a solid real-world understanding of the benefits and ways to build out a working Hadoop/Big Data solution on OpenStack.