Overview
#

Why?
#

This is my first technical post, and when this article is created, I’m 2 years in my journey working in cloud. Not too long, yet not too short. I felt that there is still a skill gap or skill issue for me to get to the next level. Hence I’m starting Reinventing The Wheel, a journey of me chain-questioning things surrounding cloud and other technology, I hope that I’ll learn a thing or two along the way.

Question Chain
#

How do Kubernetes ensure cluster state is persistent?

Kubernetes use etcd cluster to store all cluster data. ¹

How can an etcd cluster work together to ensure all data is persisted?

etcd use a consensus algorithm called Raft ²

Why Raft? Is there any other consensus algorithm?

“There is only one other alternative³”, Paxos⁴, but Raft is way simpler than its counterpart and has been used in production for multiple cloud technology such as etcd, Kubernetes, Docker Swarm, Cloud Foundry Diego, CockroachDB, TiDB, Project Calico, Flannel, Hyperledger and more.⁵

I don’t want to read the whole series, any TL;DR?

TL;DR
#

It’s been 10 years since Kubernetes launch back in 2014 where the trends of “scalable” and “distributed” system is still ongoing to this day. This boom doesn’t happen without any reason, it started on a paper released by Diego Ongaro and John Ousterout titled “In Search of an Understandable Consensus Algorithm” where they challenge themselves to find an easier version of the Paxos algorithm, the go-to distributed system algorithm back then. They invented, Raft, the algorithm that power etcd, the backbone of Kubernetes, and many more to come.

Raft is designed for understandability in mind and consist of three parts. First is leader election, where the cluster select one of the servers as a leader, second, log replication, where the leader receive command from clients, appends them to its log and replicate its logs to other servers, and last is, safety, where only a server with an up-to-date log can become leader

Leader election start with a timeout in a servers, it’ll start the election by turning into a “Candidate” and ask for vote to other servers. Once it get the majority of votes, it’ll act as a leader. The election timeout is always refresh everytime a new Leader communicate to a server, preventing any election to happened. In a split vote situation, it’ll just wait out for another server to timeout and start a new round of election.

Once a leader is elected, it’ll start the log replication part of Raft. There are 2 states of log, appended and committed, where an appended log wait for itself to be shared to the majority of server, and once shared, will be committed, in this case the log is applied to the state machine on each of the servers. These states are use to ensure the consistencies of log can be maintain between outage. The safety of the log replication is also considered during leader election, where candidate with lower committed log index won’t get any vote.

Since Raft is implemented everywhere, notable Golang-based library would be etcd/raft and hashicorp/raft that to this day still being maintained by the community. By the end of this series, my goal would be to create my own Raft implementation in Golang

Web - Kubernetes documentation ↩︎
Github - etcd ↩︎
Youtube - Ben Johnson’s “Raft - The Understandable Distributed Protocol” talk on Strange Loop Conference ↩︎
Paper - Paxos by L.Lamport ↩︎
Github - Raft implementation used by etcd ↩︎

Raft - This article is part of a series.

Part 1: This Article

Part 2: Raft Consensus Algorithm: Leader Election

Overview#

Why?#

Question Chain#

TL;DR#

Overview
#

Why?
#

Question Chain
#

TL;DR
#