Fri, Aug 7, 2020 gcp / gke / kubernetes / load-balancing

Multi-Cluster Load Balancing with GKE

One of the features I like the most about GCP is the external HTTP(S) Load Balancing. This is a global load balancer which gives you a single anycast IP address (no DNS load balancing needed, yeey!). Requests enter the Google’s global network at one of the edge points of presence (POPs) close to the user,11. Over 90 locations around the world - Load Balancing - Locations and are proxied to the closest region with available capacity. This results in highly available, globally distributed, scalable and fully managed load balancing setup. It can be further augmented with DDoS and WAF protection Cloud Armor, Cloud CDN or Identity-Aware Proxy (IAP) to secure access to your web applications.

With this, multi-cluster load balancing with GKE immediately comes into mind and is often a topic of interest from clients. And while there’s no native support in GKE/Kubernetes at the moment,22. Leaving the Anthos aside for now. Anthos is application management platform that enables you to run K8s clusters on-prem and in other clouds, and also extends functionality of GKE Clusters, incl. multi-cluster ingress controller.) GCP provides all necessary building blocks to set this up yourself.

Let’s get familiar with the GCP Load Balancing components in the first part. We will follow the journey of a request as it enters the system and understand what each of the load balancing building blocks represents. And we will setup load balancing across two GKE clusters step by step in the second part.

GCP Load Balancing Overview

Fig. 1: GCP Load Balancing OverviewFig. 1: GCP Load Balancing OverviewFig. 1: GCP Load Balancing Overview

Let’s start with a high-level Load Balancing flow overview. HTTP(S) connection from client is terminated at edge location by Google Front Ends (GFEs),33. GFEs are software-defined, scalable distributed systems located at Edge POPs. based on HTTP(S) Target Proxy and Forwarding Rule configuration. The Target Proxy consults associated URL Map and Backend Service definitions to determine how to route traffic. From the GFEs a new connection will be established, and traffic flows over the Google Network to the closest healthy Backend with available capacity. Traffic within the region is then distributed across individual Backend Endpoints, according to their capacity.

GCP Load Balancing Components

Fig. 2: GCP Load Balancing ComponentsFig. 2: GCP Load Balancing ComponentsFig. 2: GCP Load Balancing Components


We will set up multi-cluster load balancing for two services - Foo and Bar - deployed across two clusters (fig. 3). We’ll use simple path-based rules, and route any request for /foo/* to service Foo, resp. /bar/* to service Bar.

Fig. 3: GKE Multi-Cluster Foo BarFig. 3: GKE Multi-Cluster Foo BarFig. 3: GKE Multi-Cluster Foo Bar


Deploy Applications and Services to GKE clusters

Let’s start by deploying a simple demo applications to each of the clusters. The application displays details about serving cluster and region, and source code is available at stepanstipl/k8s-demo-app.

Fig. 4: K8s Demo AppFig. 4: K8s Demo AppFig. 4: K8s Demo App

Repeat following steps for each of your clusters.

You can verify services are setup correctly by forwarding local port using the kubectl port-forward service/foo 8888:80 and accessing the service at http://localhost:8888/.

Now don’t forget to repeat the above for all your clusters.

Setup Load Balancing (GCLB) Components

Connect K8s Services to the Load Balancer

GKE has provisioned NEGs for each of the K8s services deployed with the annotation. Now we need to add these NEGs as backends to corresponding backend services.

Test Everything’s Working

Curl your DNS name https://foobar.[your-domain] (or open in the browser). You should get 502 for the root, as we didn’t add any backends for the default service.

curl -v "https://foobar.[your-domain]"

Now curl paths for individual services https://foobar.[your-domain]/foo/ or https://foobar.[your-domain]/bar/ and you should receive 200 and content from the corresponding service.

curl -v "https://foobar.[your-domain]/foo/"
curl -v "https://foobar.[your-domain]/bar/"

If you retry a few times, you should see traffic served by different Pods and Clusters.1212. If you have clusters in different regions, GCLB will prefer to serve the traffic from the one closer to the client, so do not expect traffic to be load-balanced equally between regions.

If you simulate some traffic, for example using one of my favourite CLI tools vegeta, you can nicely observe traffic distribution across backends in the GCP Console. Go to Network services -> Load balancing section -> select your load balancer -> Monitoring tab and select corresponding backend. You should see a dashboard similar to fig. 5.

Fig. 5: GKE Console - Load Balancing (both cluster were in the same region, therefore traffic is load-balanced equally across all backends)Fig. 5: GKE Console - Load Balancing (both cluster were in the same region, therefore traffic is load-balanced equally across all backends)Fig. 5: GKE Console - Load Balancing (both cluster were in the same region, therefore traffic is load-balanced equally across all backends)

Now it’s a good time to experiment a bit. Let’s see what happens if you have clusters in a same region, and what if they’re in different regions. Increase the load and see the traffic overflow to another region (hint: remember the --max-rate-per-endpoint used before?). See what happens if you take one of the clusters down. And can you add a 3rd cluster in the mix?

(optional) gke-autoneg-controller

Notice the annotation on the K8s Services. It is not needed for our setup, but optionally you can deploy gke-autoneg-controller1313. I’ll not go into the details on how to deploy and use here, please follow the readme, but basically add annotation with the name of the NEG to your service, e.g. '{"name":"autoneg_test", "max_rate_per_endpoint":1000}'. to your cluster, and use it to automatically associate NEGs created by GKE with corresponding backend services. This will save you some tedious manual work.

Good Job!

And that is it. We have explained purpose of individual GCLB components and demonstrated how to set up multi-cluster load balancing between services deployed in 2 or more GKE clusters in different regions. For a real life use I would recommend to automate this setup with a configuration management tool, such as Terraform.

This setup both increases your service availability, as several independent GKE clusters serve the traffic, and also lowers your latency. In case of HTTPS the time to first byte is shorter, as the initial TLS negotiation happens at the GFE server close to the user. And with multiple clusters, the request will be served by the closest one to the user.

Please let me know if you find this useful and any other questions you might have, either here or at @stepanstipl. 🚀🚀🚀 Serve fast and prosper!