This post is a work in progress.
Introduction
Distributed computing systems are an important reality to wrestle with at many levels of technical exposure. Similar to the concept of a cluster I discussed in this blog post, a distributed system is “a collection of autonomous computing elements that appears to its users as a single coherent system” (van Steen, Tanenbaum). Distributed computing environments introduce new dynamics which can affect application development. Enterprise-level corporations (< 500-1000 employees) are best known to operate in distributed environments, and any organization larger than that is guaranteed to use distributed computing. To contrast a cluster from a distributed system, I would say that a cluster is a galaxy and a distributed computing environment is a universe.
It is common for someone new to a distributed system to make some basic assumptions which will cause more issues given time. These assumptions are known as the Fallacies of Distributed Computing.
Fallacies of Distributed Computing
The fallacies of distributed computing are:
- The network is reliable;
- Latency is zero;
- Bandwidth is infinite;
- The network is secure;
- Topology doesn’t change;
- There is one administrator;
- Transport cost is zero;
- The network is homogeneous.
Reading List
The reading list below is a copy of the currently available version written by Christopher Meiklejohn.
Consensus
The problems of establishing consensus in a distributed system.
Consistency
Types of consistency, and practical solutions to solving ensuring atomic
operations across a set of replicas.
- Highly Available Transactions: Virtues and Limitations
Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica
2013
- Consistency Tradeoffs in Modern Distributed Database System Design
Daniel J. Abadi
2012
- CAP Twelve Years Later: How the “Rules” Have Changed
Eric Brewer
2012
- Calvin: Fast Distributed Transactions for Partitioned Database Systems
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, Daniel J. Abadi
2012
- Optimistic Replication
Yasushi Saito and Marc Shapiro
2005
- Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
Seth Gilbert, Nancy Lynch
2002
- Harvest, Yield, and Scalable Tolerant Systems
Armando Fox, Eric A. Brewer
1999
- Linearizability: A Correctness Condition for Concurrent Objects
Maurice P. Herlihy, Jeannette M. Wing
1990
- Time, Clocks, and the Ordering of Events in a Distributed System
Leslie Lamport
1978
Conflict-free data structures
Studies on data structures which do not require coordination to ensure
convergence to the correct value.
Distributed programming
Languages aimed towards disorderly distributed programming as well as
case studies on problems in distributed programming.
- Logic and Lattices for Distributed Programming
Neil Conway, William Marczak, Peter Alvaro, Joseph M. Hellerstein, David Maier
2012
- Dedalus: Datalog in Time and Space
Peter Alvaro, William R. Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, Russell Sears
2011
- MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean, Sanjay Ghemawat
2004
- A Note On Distributed Computing
Samuel C. Kendall, Jim Waldo, Ann Wollrath, Geoff Wyant
1994
Systems
Implemented and theoretical distributed systems.
- Spanner: Google’s Globally-Distributed Database
James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman,Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh,Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura,David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak,Christopher Taylor, Ruth Wang, Dale Woodford
2012
- ZooKeeper: Wait-free coordination for Internet-scale systems
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, Benjamin Reed
2010
- A History Of The Virtual Synchrony Replication Model
Ken Birman
2010
- Cassandra — A Decentralized Structured Storage System
Avinash Lakshman, Prashant Malik
2009
- Dynamo: Amazon’s Highly Available Key-Value Store
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels
2007
- Stasis: Flexible Transactional Storage
Russell Sears, Eric Brewer
2006
- Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber
2006
- The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
2003
- Lessons from Giant-Scale Services
Eric A. Brewer
2001
- Towards Robust Distributed Systems
Eric A. Brewer
2000
- Cluster-Based Scalable Network Services
Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Paul Gauthier
1997
- The Process Group Approach to Reliable Distributed Computing
Ken Birman
1993
Books
Overviews and details covering many of the above papers and concepts compiled into single resources.