There are several ways to scale software applications. The way of scaling is often thought about during the design phase of the system. I have 3 ways of scaling that I - and many others - often refer to. I see these as "scaling patterns" just like software has "design patterns". The 3 patterns can be used to scale data, servers, and services, they are mentioned in the "art of scalability" book. This is a good read for any architect or software developer who are into software architecture (I will link it at the bottom).
Scale by duplication
This is a very common way to scale a system, especially in the web world. This way of scaling is used by duplicating the current system by making exact copies. This is often done with websites and webservices, having the same site hosted on several servers and placing a load balancer in front. A client can be served by any of the servers, the load balancer decides which.
Scale by duplication is also known as horizontal scaling. Some use the definition of horizontal and vertical scaling or scale up and scale out. In these principles vertical scaling is making a single node perform better. For example adding more RAM or CPU to a server or making an application perform better by optimising it. The reason why horizontal scaling is often preferred over vertical is that you can only improve performance so much, whereas adding more duplicates is often cheaper and easier.
Scale by splitting functionality
Another way of scaling is to split the system into multiple applications or services based on functionality. This means that each service is an application of its own, one way to achieve this is through the microservice architectural style. Where the system is split depends on the system and data model. If you are working with microservices, then you have probably asked yourself the question of how small/large should microservices be? This applies here as well. In the below image, the functionality to handle different types of data is split into 3 categories. In the "scale by duplication" principle the same functionality is in each of the boxes, but when we scale by splitting functionality we do the opposite. These two principles are often mixed, split by functionality also creates applications with high coherence.
Scale by creating subsets
Creating subsets is also known as sharding and is another form of scaling. It the the way of "splitting data that is the same". This can be done with different ranges commonly these are alphabetic or numeric. Such as postal codes or letters in names. In database architecture design this is called sharding - and one set is called a shard. This is only needed in cases where you have an immense amount of data or requests.
At all times of development it is important to keep scaling in mind. You do not have to choose one of the above. I believe you should aim at being able to use 2 of them, but I have rarely seen applications scaled by all 3 of them, as this is at most times unnecessary.
Scaling is not everything
As with everything else you have to ask yourself why you need to do this? What is it you want to achieve here? Is it a fallback if one of your services i down? (availability). Maybe your services have become slow and you suddenly wish for them to perform better (scalability). At the same time you might not want to sacrifice any consistency in your system. Some systems can live with eventual consistency others need to be consistent at all times. Scaling is not the only thing to take into consideration when designing your system or defining your architecture.
That is it!
I hope you liked this post, let me know what you think in the comments :)
You can find the book "the art of scalability" here (affiliate link):
Disclosure: Bear in mind that some of the links in this post are affiliate links and if you go through them to make a purchase I will earn a commission. Keep in mind that I link these companies and their products because of their quality. The decision is yours, and whether or not you decide to buy something is completely up to you.