Autoscaling your GKE workload

Autoscaling enables you to worry less about capacity planning and ensures the uptime of your services during load peaks. All this while you only pay what resources are needed at any given moment.

Details

With autoscaling configured, GKE automatically adds new node(s) to your cluster if you've created new Pods that don't have enough capacity to run; conversely, if a node in your cluster is underutilized and its Pods can be run on other nodes, GKE can delete the node.

Keep in mind that when resources are deleted or moved in the course of autoscaling your cluster, your services can experience some disruption. For example, if your service consists of a controller with a single replica, that replica's Pod might be restarted on a different node if its current node is deleted. Before enabling autoscaling, ensure that your services can tolerate potential disruption or that they are designed and configured so that downscaling does not disrupt Pods that cannot be interrupted.

Availability

All nine Managed GKE clusters come with cluster autoscaling enabled. But there are a few things that have to be configured in order to automatically scale your workload.

Usage

Scaling your workloads horizontally

Let us know your maximum node count

By default we won't just scale your cluster to an infinite amount of nodes to guard you from unexpected costs. We have defined a minimum count of 3 nodes and a variable count of maximum nodes. Let us know what your preferred maximum node count is and we will set it for your cluster.
Set CPU requests on your pods

The cluster autoscaler is using this as a base to know how much capacity a node has. Without setting CPU requests the cluster autoscaler does not function. Plus this is good practice regardless if you make use of the autoscaler or not.
Setup a Horizontal Pod Autoscaler

To scale your pods with the incoming load you can setup a Horizontal Pod Autoscaler (HPA) to scale the pods on the CPU utilization. As soon as your nodes are full this will in turn trigger the cluster autoscaler to add more nodes. The Kubernetes documentation has a great walkthrough to help you setup a HPA.

Scaling on custom metrics

There is the possibility to scale horizontally by leveraging an optional managed installation of keda. Keda allows to get metrics from various backends (called 'scalers' in keda terms) and to scale based on them. You can find the available scalers in keda's documentation. Keda also allows to scale on own custom metrics, by providing a self written external scaler.

If you are interested in a managed installation of keda or if you just want to know if your use case is supported by it, please send us a message.

Scaling your workloads vertically

To have kubernetes schedule your workloads properly you need to set resource requests on your containers (see How pods with resource requests are scheduled). This will tell kubernetes how many resources your container will use. The numbers you set are actually independent of what your container will really use when it is running. In reality the 'requested resources' differ a lot from what the pod is really using.

As it is hard to estimate 'resource requests' (and as they also might change over time), the vertical pod autoscaler project aims to provide sane settings for you. It monitors the resource usage of your pods when they are running and provides recommendations for 'resource requests'. Depending on the updateMode setting, it will also apply those recommendations to your running pods, by evicting and restarting them with updated 'resource requests'. Please make use of Pod Disruption Budgets when using VerticalPodAutoscaler resources, as otherwise the eviction of pods might lead to disruptions. You should also always use at least 2 replicas.

How to create a VerticalPodAutoscaler resource

A VerticalPodAutoscaler resource can be used for kubernetes resources which control/manage pods themselves like Deployments, StatefulSets, ReplicaSets, etc. Here is an example for a VerticalPodAutoscaler which targets a Deployment called 'my-app'. The Deployment needs to be in the same namespace as the VerticalPodAutoscaler resource itself.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"

This resource can be created with the usual kubectl create -f <filename> -n <namespace> command.

The vertical pod autoscaler will check the usage metrics of the pods created by the Deployment. After some time, the VerticalPodAutoscaler resource will be updated with the found recommendations. As the updateMode was set to "Auto" it will also replace the pods and restart them with the recommended values set. You can set the updateMode to "Off" to just get the recommendations without replacing pods.

Please have a look at this tutorial from Google which also provides information about how to disable recommendations for certain containers.

Known Limitations

For known limitations, see vertical-pod-autoscaler known limitations. Most notably you shouldn't use a VerticalPodAutoscaler in combination with a HorizontalPodAutoscaler when scaling on cpu or memory.

Documentation and Links

GKE cluster autoscaler details

Details​

Availability​

Usage​

Scaling your workloads horizontally​

Scaling on custom metrics​

Scaling your workloads vertically​

How to create a VerticalPodAutoscaler resource​

Known Limitations​

Documentation and Links​