It is impossible for a distributed system to provide more than two of the following three guarantees simultaneously:

  • Consistency: All clients see the same data at the same time, regardless of the node they connect to.
  • Availability: Any client requesting data receives a response, even if some nodes are down.
  • Partition Tolerance: The system continues to operate despite network partitions, which are communication breaks between two nodes.

Key-Value Store are often classified based on the two CAP characteristics they support:

  • CP (Consistency and Partition Tolerance): Support consistency and partition tolerance while sacrificing availability.
  • CA (Consistency and Availability): Support consistency and availability while sacrificing partition tolerance. CA cannot exist in real-world because network failure is unavoidable.
  • AP (Availability and Partition Tolerance): Support availability and partition tolerance while sacrificing consistency.

Implications

  • In a distributed system, data is usually replicated multiple times.
  • Ideally, data written to one node is automatically replicated to others, achieving both consistency and availability.
  • In the real world, network partitions can occur, forcing a choice between consistency and availability.
  • The theorem encouraged database engineers to explore distributed shared-nothing systems, which were more suitable for implementing large-scale web services and led to the explosion of new database technologies.

Limitations

  • Sometimes presented as “Consistency, Availability, Partition tolerance: pick 2 out of 3”, which is misleading because network partitions are a type of fault and will inevitably happen.
  • A system can provide both consistency (Linearizability) and total availability when the network is working correctly. When a network fault occurs, a choice must be made between linearizability and total availability.
  • There are contradictory definitions of availability in discussions of CAP, and the formalization of the theorem does not match its usual meaning.
  • Many “highly available” systems do not meet CAP’s definition of availability.
  • It only considers one consistency model (linearizability) and one kind of fault (network partitions). It doesn’t address network delays, dead nodes, or other trade-offs.
  • CAP has little practical value for designing systems and is mostly of historical interest today.
  • It has been superseded by more precise results, and there are more interesting impossibility results in distributed systems.