BSc Docs

SCAD Scalable Cloud Apps

5 min read · tagged scad

Contents

Scalable Cloud Application

Cloud-Native Application (CNA): Fully exploiting cloud services and platform features

  • elastic scalability
  • resilience
  • confidentiality
  • portability
  • cost-efficiency / pay-per-use

Scalability

Adapt to demand

Manual- vs. auto- vs. prescaling

Properties

  • Elasticity (time): degree of adaption
  • Transformativity (space/behaviour): self-similarity, bottlenecks
  • Boundedness (space): upper/lower bounds

Mismatches

  • Billing cycle
  • Memory allocation
  • Dynamic profile

Scaling for app packages/functions

  • Indirect through runtime
  • Configurable

Vertical Scaling

  • Hypervisor memory (e.g. KVM setmem <xG> --live)
  • Hypervisor processor (e.g. cpu_set <n+1> online, DVFS tuning)

Vertical Scaling for Containers

  • Static resource limits (Docker, Kubernetes descriptors)
  • ElasticDocker: Feedback loop for memory + CPU assignment (MAPE-K)

Horizontal Scaling

  • Parallel, single host: Multiple processes and threads, limited by # CPU cores
  • Distributed, multiple hosts: Multiple instances over the network, limited by number of nodes.

Horizontal Scaling for Containers

  • Amdahl’s law: Decreasing effectiveness of scaling factor
  • Optimal scale detection: Beforehand (prescaling) vs later (costly)
  • Optimality: Determine sweetspots (minimum cost , minimum makespan -> minimum weighted utility)

Diagonal Scaling

  • Solution to issue in public clouds for limited granularity x slow scaling speeds x flat pricing schemes
  • Increase in savings vs application loading

Autoscaling

  • Realisation of on-demand provisioning
  • More resouces/processes on one node for vertical scaling
  • More instances/replication for horizontal scaling

Taxonomy: Autoscaling

  • reactive

    • app–agnostic
    • app-specific
  • proactive

    • app-specific

Reactive

  • chaotic over-/underprovisioning
  • risk of SLA violation

Proactive

  • controlled
  • SLAs maintained

Prediction

AR: Autoregression

MA: Moving Average

ARMA: Combined AR/MA

Polynomial & Spline Interpolation

Time series data -> predictor -> user prediction -> resource planner -> resource demand -> action plan generator -> action plan for scaling

Predictors (users/time)

  • Baseline: Use current value
  • Nearest neighbor: Look back to nearest neighbor bit same value and repeat pattern for the given timeframe
  • Lagged neighbor: Look back to lagged neighbor defined by a period and repeat pattern for given timeframe

Scalable App Design

Multi-Tier

Characteristics

  • Popular for web apps and client-server
  • 3-tier: frontend, backend, DB
  • 4-tier: client/frontend, delivery, aggregation/logic, services

Advantages

  • Simple monolithic blocks
  • Established standards for protocols
  • Sufficient flexibility for complementary blocks

Disadvantages

  • Inequal loads and bottlenecks
  • Only applicable in isolated applications

Service-oriented

Characteristics

  • Service orientation

    • Uniform description
    • Communication
    • Encapsulation
    • Reuse
  • Discovery
  • Flexible dynamic bindings

Advantages

  • Consequent evolution of complex software design
  • Viable basis for rapid prototyping of services

Disadvantages

  • Waning support for classical SOA technologies
  • No guidance for how to design, build, run services

Cloud-native

Characteristics

  • Scalability
  • Resilience
  • Refines service orientation with microservices
  • Exploits capabilities of cloud environments
  • Stateless microservices: self-management, self-healing, autoscaling
  • DBaaS

Advantages

  • Closest to ideal scalability
  • Elastic, hardly bounded/transformative
  • Matched by contemporary technologies
  • Conversion of legacy applications possible

Disadvantages

  • Technological immaturity and volatility
  • Emerging tools and languages
  • Incompatibilities
  • Insufficiencies

Messaging

Diverse types of messaging systems

  • Message queues
  • Message brokers
  • Stream processing
  • Pub/sub
  • Gateways
  • Service buses
  • Microservice mesh
  • with billing by # messages, payload size, mgmt calls

Kafka

  • Sharding + threading: Inverse of Amdahl’s law
  • High availability if partition replias are on different fault and update domain.

Coordination

Consensus, voting, leader election, synchrony

  • Coordination faults
  • Byzantine faults/attacks

Majorities

  • None
  • Relative (n > m for all n,m in N)
  • Absolute (n > N/2)
  • Byzantine (n > 2N/3)
  • Consensus (n == N)

Paxos Consensus

  • Requires absolute majority
  • Extension: Byzantine Paxos
  • Roles: Voters/acceptors, learners, proposers, leader
  • Depends on leader election
  • Leader elected by successful proposal
  • Otherwise battling leaders
  • Depends on redundancy
  • Ignores failures of redundant participants

Swedish Leader Election

  • Based on rounds of elimination of candidates
  • Flop coin - winner proceed into next round, otherwise play again
  • Last remaining candidate is leader
  • Good characteristics

Raft

  • Recent, modular algorithm
  • Uses log replication
  • Random timeout for followers -> become candidates
  • Absolute majority of votes -> leader
  • Heartbeats define election term

Implementations

  • Zookeeper: Zab algorithm (Zookeeper atomic broadcast), similar to Paxos but with stricter ordering guarantees
  • Etcd/Consul: Distributed service discovery and configuration with Raft
  • Microsoft Distributed Mutex (C#): Leader election, custom algorithm

CAP theorem: focus on CA microservices

BAC theorem: when backing up microservices, it it not possible to have both availability and consistency

Summary

  • Coordination, data distribution, autoscaling
  • Self-* characteristics
  • Cloud-native components

Distributed Microservice Scalability

Performance benchmark

  • Performance must remain high
  • Not entirely linear (Amdahls law)
  • FaaS 1:1 event:instance
  • CaaS n:m (many events, few instances)

OpenShift / APPUiO

oc scale --replicas=3 dc app
oc autoscale app --min 1 --max 10 --cpu-percent=80

IBM Cloud

cf aasp app policy.json

GCP

gcloud compute instance-groups managed set-autoscaling app-igroup \
    --max-num-replicas 10 \
    -target-cpu-utilization 0.92 \
    --cool-down-period 30

Fallacies

  • The network is reliable: App needs error handling and retry
  • Latency is zero: App should minimize # of requests
  • Bandwidth is infinite: App should send small payloads
  • The network is secure: App must secure its data/authenticate requests
  • Topology doesn’t change: Changes affect latency, bandwidth & endpoints
  • There is one administrator: Changes affect ability to reach destination
  • Transport cost is zero: Cost must be budgeted
  • The network is homogeneous: Affects reliability, latency & bandwidth

Avatar of Simon AnlikerSimon Anliker Someone has to write all this stuff.

About the author.