stepan.wtf
Tue, Aug 27, 2019 kubernetes / helm / terraform

To Helm or Not?

Helm is becoming a very popular tool to manage deployments on Kubernetes. The proposal is tempting – easily install and manage complex applications. It’s often picked up “by default”, without thinking twice whether it’s even needed, nor what the implications of such a choice are. I recommend caution before using it in more serious environments and in this post we’ll look at the main reasons why.

For more details on Helm and its architecture see documentation. Let’s go over the basic Helm concepts. Helm consists of a client, called Helm, and server part deployed within the cluster, called Tiller. Applications are packaged into charts – versioned bundles of Kubernetes manifests and additional logic (templating, hooks). Deployed instance of a chart is called a release.

Helm helps to manage Kubernetes applications as a whole, even the more complex ones. You can install or upgrade application using one command, such as helm install stable/mysql. Without Helm, this would typically involve creating and applying several Kubernetes manifests.

This is nice, but comes at a cost. let’s look at the main drawbacks of using Helm.

Additional complexity

It’s another application you have to install, operate and maintain. This means amongst other updating both client and server components, securely configuring Helm (more about that in a bit), issuing and distributing TLS certificates.

Over 90k lines of code mean a lot of places for bugs, 765 open GitHub issues at the time of writing. Some of them, mentioned later in this article, known and unresolved for a long time.

Helm is an additional stateful component in a critical path of operating your cluster. If anything happens, you won’t be able to manage your workloads. Think about scenarios like Tiller TLS credentials expiration, or what if Tiller’s state gets corrupted or lost?

Security

Tiller has its own gRPC based interface for the client. By default, it is open for access with no authentication nor authorization from within the cluster. Exploring the Security of Helm is a good article more closely looking at security side of things. This, in effect, gives every user open access to management of your cluster. The first thing you should think about is securing Tiller with TLS certificates.

Another issue is that Helm goes around the Kubernetes authorization and RBAC concepts. All actions are carried out under the Tiller service account, which by default has complete administration access to the cluster. Helm itself does not have any concept of authorization. In a typical deployment anyone having access to Helm will have full unrestricted access to the cluster. Approaches to partially mitigate this exist, e.g. installing multiple instances of Helm/Tiller, each restricted to its namespace, yet that increases complexity and management overhead even more. Combine these two issues together, and you’re giving any cluster user full administrative access just by installing helm.

Loss of audit trail

Related to the previous issue, all actions carried via Helm will look like and be logged as being carried out under the Helm’s service account. And while you can, at least in theory, manage and distribute individual TLS certificates and put Helm behind a proxy to collect your own access logs, in practice you have just lost audit trail of who is doing what in your cluster.

Sensitive values

If you use charts with any sensitive values, e.g. database credentials, these will be stored in Tiller’s state. By default, Tiller stores release information in ConfigMaps and all the sensitive values are stored there too. ConfigMaps are not designed to hold sensitive data and the consequence is that these might leak to audit logs and will not be subject to any special treatment that true Secrets get, such as encryption at rest. Besides, it’s another place where your sensitive values are stored.

Loss of visibility

By introducing Helm you’re putting another abstraction layer between Kubernetes and user. Helm does not observe nor propagate actual state of resources, resulting in loss of direct visibility. Everything can look OK at the Helm layer, but resources might be in fact in a failed state.

Let’s take the following example: You deploy a chart and it succeeds. After the initial deployment, one of its required resources is deleted and the pods won’t run any more. Querying Helm status still shows that everything is in order, but it is not. You have lost visibility of the actual state. Should you work with Kubernetes resources directly (for example using kubectl), this problem would be immediately visible.

Terraform

In addition, there are several annoying issues with Helm provider implementation (I’ve contributed to fixing some, such as #161, but others still remain). The issue is even more obvious in combination with a configuration management tool, such as Terraform. When actual state diverges from the desired one, you would expect this to be rectified on the next run. With Helm in the middle, the tool won’t have visibility of the underlying state of Kubernetes resources, and won’t be able to converge actual state to the desired one.

This breaks one of the core concepts of Terraform’s declarative approach. And would not be an issue when working with Terraform’s Kubernetes resources directly.

This little screencast shows how the combination of Helm & Terraform results in loss of visibility of the actual state.

Charts

You have to create, manage and store charts somewhere, and this is extra work and code to maintain.

One argument is that there are many charts readily available to you, but there are issues with these as well. – do you want to use charts owned and managed by someone else? You should not blindly trust 3rd party charts, as this gives anyone control over your cluster, especially in combination with constructs like latest tags of container images. And even many “official” charts do in fact reference images by latest tag. Take this scenario – the chart or Docker image gets compromised, and so will your cluster and data. Many charts use old images (with known vulnerabilities) and are configured in an insecure and unreliable manner. Definitely not production-ready. Many charts are also over-complicated – include a lot of code and logic that’s not relevant to your environment and only adds to the complexity. The resulting behaviour is often not fully understood by its consumer.

Closed for extension – charts are still difficult to extend to meet your particular needs, e.g. when requiring extra labels for Istio or Prometheus, or adding manifests such as Network Policies. Usually, you’ll either end up keeping those modifications on the side (undesirable) or cloning the charts and maintaining own fork (extra work).

Versioning

You have to version charts and increase the version when making changes, as Helm won’t deploy those otherwise. Now, unrelated to the version of the application being deployed, you have an additional version to maintain for every chart… why?

1
2
3
4
5
6
7
8
9
# Example of typical Helm chart definition
# borrowed from couchdb stable chart
apiVersion: v1
name: couchdb
# Version of the chart, unrelated to 
# the version of the deployed app
version: 2.0.1
# Version of the deployed app
appVersion: 2.3.1

This, typically in combination of storing the charts in the same repository as rest of your code, results in having to maintain a set of versions without any practical meaning,

If you decide to keep charts in a dedicated chart repository (which I believe is a bit more in the spirit of intended use), you’re up for a whole new set of challenges. Again think about the overhead of setting this up, securing and operating not just the repository, but also pipelines for building and publishing the charts. By the way, Helm only supports basic http authentication for downloading charts.

CRDs

One unsolved problem with Helm is management of CustomResourceDefinitions (CRDs). Because Helm lacks any dependency mechanism, you can’t have both CRDs and their instances as ordinary resources in one chart. There’s no guarantee that CRDs will be created and ready when Helm creates the respective resources. When using crd-install hook, resources are not cleaned up after the release is deleted, and also will fail with ...already exists error when a chart is installed again. Helm tries to overcome this by introducing crd-install hook, but this never really worked for multiple reasons. In short there’s no good way to manage CRDs with Helm, and most of the charts end up putting these aside and managing manually (see cert-manager for example).

Namespaces

Another issue I’ve hit on more than one occasion is with Namespaces. Helm tries to manage namespaces on its own – if the namespace for deployment does not exist, it will be created. This happens even if the resource in question is actually a Namespace itself, so including namespace manifest within the chart won’t work either. Unfortunately, there is no support for creating namespaces with custom labels or annotations. This is especially painful in combination with Istio, which requires istio-injection: enabled namespace annotation to enable the auto-injection webhook.

Benefits

So far we’ve been through quite a few downsides. So what are the benefits of using Helm?

From my experience, Helm is more often than not used as a simple templating tool, without properly versioning charts and storing them in dedicated chart repository. Helm would also typically be used as a complement to already existing configuration management tool, like Terraform, Puppet, Ansible or Salt. Each of these has robust templating mechanisms built-in.

One might argue that Helm provides more functionality via init hooks. And it does, but this does not fit cloud-native pattern very well. Usually it means that you’re trying to fix something wrong with the application in the first place on the infrastructure level. This kind of logic should be either built into the application, or, for more complex scenarios look at operators concept.

Note on Future

At the time of writing alpha.2 release of Helm 3 has just been published, but I didn’t have time to try it out. Perhaps a topic for another article. It seems like in next version Helm is going to be client only, which should lift some of the limitations. It also plans to introduce Lua support, which on the other hand might be just even more unneeded complexity.

Summary

Let’s recap the main drawbacks we’ve talked about:

Based on the experience of working with Helm in several production environments of different nature and size, I believe that Helm brings more pain points than it solves. The overhead of implementing and maintaining the whole ecosystem is significant, without any substantial benefits.

Kubernetes support in Terraform is by far perfect, mainly lacking coverage and support for custom resources. Perhaps a topic for another post, but I would recommend to check out Eric Chiang’s K8s and Davis Ford’s Kubernetes YAML providers. I prefer to keep things simple and stick to one management tool, in my case usually Terraform, to do the whole job. Also, kubectl itself is a pretty powerful and battle-tested tool.

Where I see a good fit for Helm is as a universal distribution mechanism between software vendors and consumers.

Nevertheless, that’s just my opinion and Helm might be a good fit for your use case. Hopefully this article made you aware of the major pain points and will help you choose wisely. I wish you a good voyage and would be more than interested to hear your experiences.