Containers Scaling: Based on workload using Docker

8 min readOct 18, 2020

Docker is a software that provides a centralized platform to execute our application. It wraps software components into a complete standardized unit that contains everything required to run, i.e., runtime environment, tools, or libraries. It guarantees that the software will always run as expected. It provides the facility to run an application in an isolated environment which is called a container. Basically, a container is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it, i.e., code, runtime environment, system tools, system libraries. Containers provide operating system-level virtualization while VMs provide hardware virtualization. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

VMs v/s Containers (Source: NetApp Blog)

Containers and VMs are similar in their goals, i.e., to isolate an application and its dependencies into a self-contained unit that can run anywhere but containers have many advantages over virtual machines like, it is light-weight, all containers share the same host OS, its startup time is in milliseconds and it requires less memory space.

We can run multiple containers simultaneously on a given host. Docker is secure by default because each container is isolated from one another. It has a client-server architecture. The docker daemon(server) receives the commands from the docker client through CLI or rest APIs. Docker client and server can be present on the same host machine or different hosts. I have discussed much Docker and the containers but you might wonder what is scaling and why there is a need to do containers scaling!!

Before going into the containers scaling, we should know what Elasticity is because scaling is related to elasticity only. So, elasticity is the ability of a system to add and remove resources to adopt the load variation in real-time. Two terms are associated with elasticity and they are scalability and efficiency. Scalability is the ability of a system to sustain an increasing workload by making use of additional resources. Another term associated with elasticity is efficiency, which characterizes how resources can be efficiently utilized as it scales up or down.

Elasticity is the ability through which clients can quickly request, receive, and release many resources as needed. It implies fluctuations, i.e., the number of resources used by a client may change over time. It can be done manually or automatically. In manual policy, the user is responsible for monitoring his virtual environment and perform all elastic actions. But in automatic policy, there is no requirement of any user for this purpose instead the control is done by the system according to the requirement of the user. So, the service providers are more focused on the auto-scaling of the resources, which is the ability of a system to automatically change the resources of a system based on the dynamic workload.

Here, we are going to use Node web server containers which need to be scaled based on the current number of requests, and also an HAProxy container for load balancing those requests on the server containers. So, there are two stages for this problem where the first stage would be to create an infrastructure for load balancing the web requests on the backend servers using HAProxy and the second stage would be to use an algorithm for scaling the backend servers based on dynamic workload.

For the first stage, we should first know what load balancing is and how HAProxy could be used for this. So, load balancing is a process of distributing the workload dynamically and uniformly across all the available nodes. This improves the overall system performance by shifting the workloads among different nodes. The major goals of the load balancing algorithm are to provide availability, performance, and flexibility. HAProxy is a tool that we could use for load balancing. HAProxy stands for High Availability Proxy, which is a popular open-source software used as TCP/HTTP load balancer and proxying solution. It acts as an HTTP reverse proxy that receives HTTP requests on a listening TCP socket and then passes these requests to servers using different connections. There are various algorithms used by HAProxy for load balancing like Round Robin, Least Connection, and Source.

HAProxy Configuration (Source: dzone.com)

In the first stage of implementation, we are going to use a Dockerfile, which is a text document that contains instructions to assemble a docker image. Docker images are templates used to create docker containers and the container is a running instance of a docker image. So, we are creating a Docker image by building the docker file and we are using node server as the base image. Our Dockerfile looks like this:-

To build an image from Dockerfile, we run the following command:-
docker build -t <Image-Name> <Location of the Dockerfile>

Since we require one HTTP server, so for that purpose we are using HAProxy and hence we need to create a container with HAProxy that would listen to port 80 and load balance the request to different Node.js containers listening on port 8080. To create these containers, we have created a docker-compose file. Docker-compose is a tool that is used to define and run multi-container Docker applications. It uses YAML files(docker-compose.yml) to configure the application’s services. It can start or stop all services with a single command. It can scale up selected services when required. The docker-compose CLI utility allows users to run commands on multiple containers at once, for example, building images, scaling containers, stopping containers that are running, and more. We are creating two services using this:-

The first service is for our Node.js container which is created with the help of the docker image that we have created earlier. For this service, we have exposed port 8080. We also specified the initial number of replicas that we wanted to create for this image. Then, we put all these containers in a network called the web.
The second service is for our HAProxy container which is created with the help of the HAProxy Docker image. The algorithm used for load balancing was also specified under the environment variables section. It depends on the node servers, so it would not boot until all the replicas of the node servers are up and running. For this container, we expose port 80 and then put this container in the same web network. Further, we made this node the manager node. To get the stats of the load balancer, port 1936 of the HAProxy is also open and mapped to port 1936 of the system which is used to listen to these stats.

We need to create a Docker swarm, which is used to manage the containers efficiently to increase the throughput. We use docker swarm to easily use utilities for scaling up and down as the algorithm decides the required number of containers. We have set the container of HAProxy as the manager node.

The following command is used to initialize the swarm:-
docker swarm init

The network, all the services, and all the containers are called a stack. To create our web stack using node server containers and HAProxy containers we used the docker stack command, but we want to point the stack to our docker-compose.yml file, so it will build the stack according to our plan there.

We can do it by writing the following command-
docker stack deploy -c docker-compose.yml <Stack Name>

When we will hit on http://localhost:8080/, we will be getting the container id in the response but it would be different for each time because the HAProxy will send each request to different containers based on the load balancing algorithm it uses.

For the second stage, I have already discussed what containers scaling is and why it is important. But, the main question that arises here is how do we create an elastic service to handle the dynamic workload. Previously, many works have been done for providing dynamic resource provisioning in an elastic service like web servers, big data clusters, etc, and also many algorithms have been implemented for providing auto-scaling of containers in an elastic service.

Basically, Auto-Scaling can be implanted in two ways which are reactive and proactive approaches. In Reactive approaches, one or more threshold(s) can be used which needs to be optimized such as response time, CPU load, memory usage, etc. After a certain interval, we find the resources exceeding this threshold, and based on these resources can be increased or decreased. On the other hand, in Proactive Scaling, we predict the workload of next time by applying some machine learning or deep learning algorithm. After the prediction of workload, we find the number of resources needed to handle this load.

Here, I have implemented the Reactive model based on a threshold on CPU usage of each container serving requests. Since the load increases rapidly we need to quickly increase the instances of our webserver to handle the increased load. So, a reactive approach is best for quickly handling a huge load. I have implemented this algorithm in python using Docker SDK:

In this case, we are increasing or decreasing the number of containers by a given factor to avoid the oscillations caused due to premature scaling. It can be better illustrated with the help of an example:- if CPU usage upper_threshold is set to 50% and I try to send 10,00,000 simultaneous requests using some benchmarking tool, then if some of the containers’ CPU usage cross this threshold then new instances would be created.

Increment in instances when cpu_upper_threshold>50%

HAProxy showing all the scaled instances are up and responding

In the above two figures, we can see that if the CPU usage exceeds the upper threshold value then it would be counted as an overloaded instance, and correspondingly scaling has been done to convert it into the target instances that would be required for that particular point of time. The same can be observed in the HAProxy statistics report also.

Similarly, if the workload decreases then the number of container instances decreases accordingly. In this way, this algorithm is capable of handling the dynamic workload. The conclusion is that auto-scaling of containers in an elastic service plays a very important role in handling the dynamic workload.

References:
1. https://docs.docker.com/config/containers/resource_constraints/
2. https://cbonte.github.io/haproxy-dconv/2.0/configuration.html
3. https://ieeexplore.ieee.org/document/7423439
4 https://www.digitalocean.com/community/tutorials/an-introduction-to-haproxy-and-load-balancing-concepts

Containers Scaling: Based on workload using Docker

Written by Gaurav Raj