Distributed Systems pt.0

This post is a work in progress.

Introduction

Distributed computing systems are an important reality to wrestle with at many levels of technical exposure. Similar to the concept of a cluster I discussed in this blog post, a distributed system is “a collection of autonomous computing elements that appears to its users as a single coherent system” (van Steen, Tanenbaum). Distributed computing environments introduce new dynamics which can affect application development. Enterprise-level corporations (< 500-1000 employees) are best known to operate in distributed environments, and any organization larger than that is guaranteed to use distributed computing. To contrast a cluster from a distributed system, I would say that a cluster is a galaxy and a distributed computing environment is a universe.

It is common for someone new to a distributed system to make some basic assumptions which will cause more issues given time. These assumptions are known as the Fallacies of Distributed Computing.

Fallacies of Distributed Computing

The fallacies of distributed computing are:

The network is reliable;
Latency is zero;
Bandwidth is infinite;
The network is secure;
Topology doesn’t change;
There is one administrator;
Transport cost is zero;
The network is homogeneous.

Reading List

The reading list below is a copy of the currently available version written by Christopher Meiklejohn.

Consensus

The problems of establishing consensus in a distributed system.

In Search of an Understandable Consensus Algorithm

Diego Ongaro, John Ousterhout 2013
A Simple Totally Ordered Broadcast Protocol

Benjamin Reed, Flavio P. Junqueira 2008
Paxos Made Live - An Engineering Perspective

Tushar Deepak Chandra, Robert Griesemer, Joshua Redstone 2007
The Chubby Lock Service for Loosely-Coupled Distributed Systems

Mike Burrows 2006
Paxos Made Simple

Leslie Lamport 2001
Impossibility of Distributed Consensus with One Faulty Process

Michael Fischer, Nancy Lynch, Michael Patterson 1985
The Byzantine Generals Problem

Leslie Lamport 1982

Consistency

Types of consistency, and practical solutions to solving ensuring atomic operations across a set of replicas.

Highly Available Transactions: Virtues and Limitations

Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica 2013
Consistency Tradeoffs in Modern Distributed Database System Design

Daniel J. Abadi 2012
CAP Twelve Years Later: How the “Rules” Have Changed

Eric Brewer 2012
Calvin: Fast Distributed Transactions for Partitioned Database Systems

Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, Daniel J. Abadi 2012
Optimistic Replication

Yasushi Saito and Marc Shapiro 2005
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Seth Gilbert, Nancy Lynch 2002
Harvest, Yield, and Scalable Tolerant Systems

Armando Fox, Eric A. Brewer 1999
Linearizability: A Correctness Condition for Concurrent Objects

Maurice P. Herlihy, Jeannette M. Wing 1990
Time, Clocks, and the Ordering of Events in a Distributed System

Leslie Lamport 1978

Conflict-free data structures

Studies on data structures which do not require coordination to ensure convergence to the correct value.

A Comprehensive Study of Convergent and Commutative Replicated Data Types

Mark Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski 2011
A Commutative Replicated Data Type For Cooperative Editing

Nuno Preguica, Joan Manuel Marques, Marc Shapiro, Mihai Letia 2009
CRDTs: Consistency Without Concurrency Control

Mihai Letia, Nuno Preguiça, Marc Shapiro 2009

Distributed programming

Languages aimed towards disorderly distributed programming as well as case studies on problems in distributed programming.

Logic and Lattices for Distributed Programming

Neil Conway, William Marczak, Peter Alvaro, Joseph M. Hellerstein, David Maier 2012
Dedalus: Datalog in Time and Space

Peter Alvaro, William R. Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, Russell Sears 2011
MapReduce: Simplified Data Processing on Large Clusters

Jeffrey Dean, Sanjay Ghemawat 2004
A Note On Distributed Computing

Samuel C. Kendall, Jim Waldo, Ann Wollrath, Geoff Wyant 1994

Systems

Implemented and theoretical distributed systems.

Spanner: Google’s Globally-Distributed Database

James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman,Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh,Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura,David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak,Christopher Taylor, Ruth Wang, Dale Woodford 2012
ZooKeeper: Wait-free coordination for Internet-scale systems

Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, Benjamin Reed 2010
A History Of The Virtual Synchrony Replication Model

Ken Birman 2010
Cassandra — A Decentralized Structured Storage System

Avinash Lakshman, Prashant Malik 2009
Dynamo: Amazon’s Highly Available Key-Value Store

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels 2007
Stasis: Flexible Transactional Storage

Russell Sears, Eric Brewer 2006
Bigtable: A Distributed Storage System for Structured Data

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber 2006
The Google File System

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 2003
Lessons from Giant-Scale Services

Eric A. Brewer 2001
Towards Robust Distributed Systems

Eric A. Brewer 2000
Cluster-Based Scalable Network Services

Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Paul Gauthier 1997
The Process Group Approach to Reliable Distributed Computing

Ken Birman 1993

Books

Overviews and details covering many of the above papers and concepts compiled into single resources.

Distributed Systems: for fun and profit

Mikito Takada 2013
Programming Distributed Computing Systems: A Foundational Approach

Carlos A.Varela, Gul Agha 2013
Guide to Reliable Distributed Systems: Building High-Assurance Applications and Cloud-Hosted Services

Ken Birman 2012
Introduction to Reliable and Secure Distributed Programming

Christian Cachin, Rachid Guerraoui, Luís Rodrigues 2011