Resource Sharing Beyond Boundaries

Mohit Soni   Santosh Marella

Agenda

  • What's up with Datacenters these days?
  • Apache Mesos vs. Apache Hadoop/YARN?
  • Why would you want/need both?
  • Resource Sharing with Apache Myriad

What's running on your datacenter?

  • Tier 1 services
  • Tier 2 services
  • High Priority Batch
  • Best Effort, backfill

Requirements

  • Programming models based on resources,
    not machines
  • Custom resource types
  • Custom scheduling algorithms:
    Fast vs. careful/slow
  • Lightweight executors, fast task launch time
  • Multi-tenancy, utilization, strong isolation
  • Preemption/oversubscription, fault-tolerance

Hadoop and More

  • Support Hadoop/BigData ecosystem
  • Support arbitrary (legacy) processes/containers
  • Connect Big Data to non-Hadoop apps,
    share data, resources

Mesos from 10,000 feet

Open Source Apache project

Cluster Resource Manager

Scalable to 10,000s of nodes

Fault-tolerant, no SPOF

Multi-tenancy, Resource Isolation

Improved resource utilization

Mesos is more than

Yet Another Resource Negotiator

Long-running services; real-time jobs

Native Docker; cgroups for years;
Isolate cpu/mem/disk/net/other

Distributed systems SDK;
~200 loc for a new app

Core written in C++ for performance,
Apps in any language

Why two resource managers?

Static Partitioning sucks

  • Hadoop teams fine with isolated clusters,
    but Ops team unhappy; slow to provision
  • Resource silos, no elasticity
  • Want to run Hadoop on the same infrastructure,
    without interrupting Tier-1 services
  • Want multi-tenancy, resource sharing/isolation

Introducing Myriad

Myriad Overview

  • Mesos Framework for Apache YARN
  • Mesos manages DC, YARN manages Hadoop
  • Coarse and fine grained resource sharing

Resource Sharing

Myriad improves Mesos

Tighter integration with Hadoop frameworks like HBase, Hive, Pig

Borrow resources from Hadoop
when traffic spikes for tier-1 services

Backfill unused resource capacity
with best-effort Hadoop jobs

No Mesos code changes necessary

Myriad improves Hadoop

Elastic scaling

Fault-tolerant: Maintain NM capacity

Share resources with other workloads,
improve resource utilization

High SLA hadoop jobs unaffected

No YARN/Hadoop code changes

Other Features

  • RM discovery using Marathon/Mesos-DNS
  • Distribution of hadoop binaries
  • User Interface
  • Ability to launch Job History Server
    (in progress)
  • Myriad scheduler HA, task reconciliation
    (in progress)
  • Your favorite feature here!

Learn More!

https://github.com/mesos/myriad

[email protected]

MYRIAD JIRA

Apache Myriad Incubator Proposal

Apache Myriad Incubator Status Page