Cassandra and its Accrual Failure Detector
For Friday’s presentation of Cassandra – a distributed storage system – I needed to understand how the system is able to detect node failures. In distributed systems a so called failure detector is sometimes used to simplify an algorithm’s work. And, Cassandra uses a failure detector called the Accrual Failure Detector. Accrual for those of you who don’t know, means accumulation, or the act of accumulating over time.
The basic idea is that a node’s state is not only up or down. It is not true or false. Rather, it is an educated guess which takes multiple factors into account. With approximation we can, for example, take slow messages into consideration and, thus, allow ourselves to be wrong. How weird?
A server (node A) suspects that a node is down because it hasn’t received the two last heartbeats from node B. Node A assigns a Phi value of (let’s say) 1. Phi denominates the suspicion level that another server might be down. This value can be adjusted dynamically according to local conditions such as load.
Phi represents the likelihood that Node A is wrong about Node B’s state. So, when a third heartbeat is considered lost Phi increases, and eventually a threshold is reached. When that happens the application will be notified about the failed node. The threshold is a configured value.
Cassandra approximates Phi using exponential distribution. Thus, the higher the Phi, the bigger the confidence that Node B has failed. I still haven’t found any more detailed explanation than the following as to why exponential is used rather than Gaussian:
Exponential Distribution to be a better approximation, because of the nature of the gossip channel and its impact on latency.
Don’t know if that made sense to anyone else, but I think I get it know.