There are several ways to scale software applications. The way of scaling is often thought about during the design phase of the system. I have 3 ways of scaling that I - and many others - often refer to. I see these as "scaling patterns" just like software has "design patterns". The 3 patterns can be used to scale data, servers, and services. They are mentioned in the "art of scalability" book, A good read for any architect or software developer who are into software and application architecture (I will link it at the bottom).
Scale by duplication
This is a very common way to scale a system - especially in the web world. This way of scaling is used by duplicating the current system by making exact copies. This is often done with websites and webservices, having the same site hosted on several servers and placing a load balancer in front. A client can be served by any of the servers, the load balancer decides which.
Scale by splitting functionality
Another way of scaling is to split the application into multiple services. This means that each service is an application of it's own. This is often seen with micro services or API's. Where the system is split depends on the system and data model. If you are working with microservices, then you have probably asked yourself the question of how small/large should microservices be? This applies here as well. In the below image, the functionality to handle different types of data is split into 3 categories. In the "scale by duplication" principle the same is on each server, but when scaling by splitting functionality we do the opposite. These two principles are of course often mixed.
Scale by creating subsets
Creating subsets or sharding is another form of scaling. It the the way of "splitting data that is the same". This can be done with different ranges commonly these are alphabetic or numeric. Such as postal codes or letters in names. In database architecture design this is called sharding - and one set is called a shard.
At all times of development it is important to keep scaling in mind. You do not have to choose one of the above. I believe you should aim at being able to use 2 of them. I have rarely seen applications scaled by all 3 of them. Which at most times would be unecessary.
As with everything else you have to ask yourself why you need to do this? What is it you want to achieve here? Is it a fallback if one of your services i down? (availability). Maybe your services have grown slow and you suddenly wish for them to perform better (scalability). At the same time you might not want to sacrifice any consistency in your system. Some systems can live with eventual consistency others need to be consistent at all times.
I hope you liked this post, let me know what you think in the comments :)
You can find the book "the art of scalability" here (affiliate link):