Kubernetes in Public Cloud: Save some cost

Look around every next company nowadays is moving to public cloud initially we don’t care about the cost. We just want to move to the cloud with any cost to see the benefit of the cloud. But eventually we realize that we are paying too much for the thing that we are not even using. We just keep the infrastructure running and we just keep on paying for that. So if you’re starting up with Kubernetes in cloud, make sure you have a right thing in place in order to save some cost while you’re still building.

I’m not an expert but I’ve been looking at some real world application and there usage and figured that with some little changes we would be able to save thousands of dollars in public cloud well building the containerized application in Kubernetes.

Replicas in Non Production

As I said before, most of us keep same configuration for Non Prod, UAT and Production which is not a right configuration. Your non production is not generating any revenue for you so why keep the same replicas as production.

I would check following to save some cost in Non revenue generating environments.

Check if we can keep 1 pod running for Non production environment.
I would check the HPA is as high as possible to make most out of my resources.

Keep your resource request/limit in check

Most of the blogs that we read around on Kubernetes. Says that keeping the resource request and limit in deployment are one of the best practices and I don’t believe otherwise. What is resource request and limit. In simple words

Kubernetes defines Limits as the maximum amount of a resource to be used by a container. This means that the container will never be able to consume more than the memory amount or CPU amount mentioned.

Requests, on the other hand, are the minimum guaranteed amount of a resource that is reserved for a container.

There is Quality of service for how you assign Limit and Request which can be read at Kubernetes Documentation. Best would be to keep Limit and Request same.

So what is the problem here? Let’s calculate. I’ll just talk about the memory. CPU will be implicit. This is the general assigned memory and cpu I see and I would take it as an example.

For our deployment we have following Memory assigned.

requests:
 memory: "100Mi" # 100 MB 
limits:
 memory: "200Mi" # 200 MB

HPA( Horizontal pod autoscaller ) is set to 80% for Memory. Mean if we reach 80% of memory for your deployment create a new pod.
Suppose we are starting with 1 pod.
Now as soon as your memory will reach 80MB, HPA will be triggered and will be spinning a new pod.
Now total we have 2 pods with 200MB of memory requested and 400MB memory as a limit.

Now if we see the next scenario as soon as we will reach 160MB of memory we will create another pod. Now lets draw some conclusion here.

There is very high probability that I will never be hitting my Limit which is 200MB. [ Only scenario it will be used when next pod is still not ready and we are throwing all the transaction at the same time which does not give next pod to be ready. I believe this is not online service scenario. This is batch scenario. ]
With limit staying twice as request. I will never be able to quantify my resources. Lets see it with example. If I have a VM with 1000MB of Memory. At most I can create 10 pods which will Request 1000MB of Memory. But now Limit can go to 2000MB which I don't have on my VM. Now even if you have assigned limit as 200MB, your pod will die with OOM as soon as they will try to go beyond 100MB. And this is where you can not quantify the memory you need and your limit will not work.

Solution

There is no simple solution. Solution comes with some real world example. By seeing some real world example I would suggest.

If we keep Request at 100MB we can keep Limit as 120MB which will give some breathable area to your pod in case HPA hit at 80MB and you next pod takes some time to spin up.
Make sure you are track your pods for the requested memory you have given. Initially we give a lot of memory and we keep on using the same without checking if we actually need that memory.
Use VPA which will help you understand the memory and CPU you need for your pod with time.
Goldilocks is a open source tool adopted by CNCF which can help you to understand the memory and CPU and it will suggest you the right amount

Some of this small changes will save you a lot of cost in long run.

Check you PDB’S [ Pod Disruptions Budget ]

Most of the public clouds comes with AutoScale feature where you node autoscale in case of demand. All these new nodes comes with price. So if you are scaling your cluster and its not coming back to normal when demand decreases, you will be paying for a VM which is not in use.

One biggest culprit I saw which holds up the nodes to go down with autoscaling enabled is PDB [ Pod disruption budget. ]

What is PDB : A Pod Disruption Budget (PDB) allows you to limit the disruption to your application when its pods need to be rescheduled for some reason such as upgrades or routine maintenance work on the Kubernetes nodes.

So, how pdb is one of the cause of bringing your GKE cost up.

When you define a PDB and if it does not have any disruption budget in it it will not let the node go down. Let see by example.

If your deployment has minimum replicas running as 3 and you have defined the PDB to have minimum 3 replicas running all the time. It wont let you scale down the node where the pod is running as you can not keep less than 3 pods.

So what is the solution?

We can keep the disruption budge. Mean we can keep total 3 replicas and and in PDB we can mention to keep minimum 2 pods and can have 1 pod distruption allowed. It will let your 1 pod go down and will not hold up the scale down of nodes.
Lets not keep PDB where you don't need it. Most of us keep the same configuration in Non prod, UAT and Prod. Which is not right configuration. We usually do not need PDB’s in non prod. Non prod environment is mostly for integration testing, development, QA and Load. Where disruption of pods does not have any financial impact.
Monitor your environment for any scale down issues and make sure your PDB’s are not the one causing it.
Scale up and down your nodes manually to see if there is going to be any issue with your auto scale.

Final thoughts

For best quality of service keep Limit and Request same.
Keep your PDB’s in check for any auto scale issue.
Lets not keep PDB’s in the environment where there is no money.
Keep your HPA high in non production environment to utilize as much resources as possible.
Keep number of pods low for non production which will keep your node count low if you have too many deployments.

Let me know if some of it helps you to keep your cost low in public cloud. I will update the document for any correction with time.

Thank you for reading. Keep reading and keep learning. I would do the same. As we learn from each other.