What is the difference between redundancy and replication in software?

For a recent exam that I studied for, I asked myself this question: "What is the difference between redundancy and replication"? I had two topics, redundancy and replication. At first I thought they were the same as both had to do with having more nodes or components, but given a little more thought there was a significant difference. Here are my findings and thoughts on this.


I knew what redundancy is, it is having one or several duplicates of nodes/components in a system. This is used in case of a failover, if one node fails, the other takes over - or just keeps going. Here there is a difference between passive and active redundancy. In active redundancy traffic goes to all nodes all the time. This is often achieved by having a load balancer in front of the duplicated nodes to direct the traffic to every node. Here a routing method has to be chosen, this can be done by round-robin or some more complex algorithm (choosing the one with lowest load for example). Redundancy can also be done passively where you switch over if the "active" node goes bad. This is often known as active/passive, where you always have one active node and can have several passive. The passive nodes do nothing except being ready for when the active node is down.

Therefore my definition of redundancy is:

Redundancy is the duplication of nodes, in case of one failing

However active redundancy can also be applied to get higher performance for a system. Instead of having one node serving everyone, you have 2 (or n+1). This makes the first one less burdened. The art of gaining performance through duplication of nodes is often better known as scaling.


So if redundancy is the duplication of nodes. Then what is replication? This is the tricky part. It is also redundant as you have duplication of nodes. The big difference is that these nodes are copying data between them, so to say, they synchronise state between them. This is often seen in databases or message queue systems - and is referred to as a cluster. If one node in a cluster goes down the system lives on as the data are copied to other nodes.

Again there are two ways of doing this. There is active replication where each message goes to each node - they are therefore always all in sync and all ready to serve requests. Then there is passive replication, here we have Primary/Replica (master/slave) relationship. The primary gets all the requests and the replicas are updated by the primary behind the scenes. If the primary goes down a replica will be promoted to primary.

Some systems are built to be "eventual consistent". With eventual consistency you could have all writes go to the primary, but allow reads on the replicas.

Therefore my definition of replication is:

Replication is synchronisation of state between redundant nodes

That was my definition of Replication and Redundancy, I hope you liked it. If you did or did not, let me know in the comments! If you know a better definition, please write this down below in the comments as well!