RAFT Consensus Protocol
The goal of RAFT is to ensure all nodes in a distributed system agree on the same data state, even when some nodes fail, experience delays, or get disconnected from the network.
RAFT aims to achieve a "Single Source of Truth" — one unified source of truth for data across the entire cluster.
Three Main Components
-
Leader Election
- Only 1 node becomes the leader in a given term (period).
- A follower that does not receive a heartbeat within a certain time → becomes a candidate → starts an election.
- Majority vote = becomes the leader.
-
Log Replication
- The leader receives commands (e.g., write new data).
- The leader propagates the log to all followers.
- After the majority acknowledges (ACK), the log is considered committed.
- All nodes will have the same log in the same order.
-
Safety & Fault Tolerance
- Decisions that have been "committed" will not change even if the leader dies.
- A new node joining can catch up from the leader's log.
Basic Mechanism
| Role | Responsibility |
|---|---|
| Leader | Receives requests from clients, sends heartbeats, replicates logs |
| Follower | Waits for commands from the leader, votes during elections |
| Candidate | Nominates itself as leader if it doesn't receive a heartbeat |
Simple RAFT Cycle
Follower → (timeout) → Candidate → (wins voting) → Leader
Leader → (dies / disconnects) → Other followers timeout → New election
Key Terms
| Term | Meaning |
|---|---|
| Term | A time period in Raft; each election starts a new term |
| Heartbeat | A routine message from the leader to followers to confirm it's still alive |
| Majority Vote | Half + 1 of the total nodes required for consensus |
| Commit Log | Data that has been approved by the majority and guaranteed to be consistent |
Systems That Use Raft
- TiDB / TiKV (PingCAP)
- Etcd (used by Kubernetes)
- Consul (HashiCorp)
- CockroachDB, RQLite
Simple Analogy
Imagine 5 people in a meeting (nodes).
- One person (leader) leads the decisions.
- If the leader leaves the room → the others elect a new leader.
- Every new decision must be approved by at least 3 people (majority) to be valid.
- Everyone records the same meeting notes (log replication).
"Raft maintains one leader, majority agreement, and identical logs across all nodes."
