To Helm or Not?
Helm is becoming a very popular tool to manage deployments on Kubernetes. The proposal is tempting – easily install and manage complex applications. It’s often picked up “by default”, without thinking twice whether it’s even needed, nor what the implications of such a choice are. I recommend caution before using it in more serious environments and in this post we’ll look at the main reasons why.
For more details on Helm and its architecture see documentation. Let’s go over the basic Helm concepts. Helm consists of a client, called Helm, and server part deployed within the cluster, called Tiller. Applications are packaged into charts – versioned bundles of Kubernetes manifests and additional logic (templating, hooks). Deployed instance of a chart is called a release.
Helm helps to manage Kubernetes applications as a whole, even the more complex
ones. You can install or upgrade application using one command, such as helm install stable/mysql
. Without Helm, this would typically involve creating and
applying several Kubernetes manifests.
This is nice, but comes at a cost. let’s look at the main drawbacks of using Helm.
Additional complexity
It’s another application you have to install, operate and maintain. This means amongst other updating both client and server components, securely configuring Helm (more about that in a bit), issuing and distributing TLS certificates.
Over 90k lines of code mean a lot of places for bugs, 765 open GitHub issues at the time of writing. Some of them, mentioned later in this article, known and unresolved for a long time.
Helm is an additional stateful component in a critical path of operating your cluster. If anything happens, you won’t be able to manage your workloads. Think about scenarios like Tiller TLS credentials expiration, or what if Tiller’s state gets corrupted or lost?
Security
Tiller has its own gRPC based interface for the client. By default, it is open for access with no authentication nor authorization from within the cluster. Exploring the Security of Helm is a good article more closely looking at security side of things. This, in effect, gives every user open access to management of your cluster. The first thing you should think about is securing Tiller with TLS certificates.
Another issue is that Helm goes around the Kubernetes authorization and RBAC concepts. All actions are carried out under the Tiller service account, which by default has complete administration access to the cluster. Helm itself does not have any concept of authorization. In a typical deployment anyone having access to Helm will have full unrestricted access to the cluster. Approaches to partially mitigate this exist, e.g. installing multiple instances of Helm/Tiller, each restricted to its namespace, yet that increases complexity and management overhead even more. Combine these two issues together, and you’re giving any cluster user full administrative access just by installing helm.
Loss of audit trail
Related to the previous issue, all actions carried via Helm will look like and be logged as being carried out under the Helm’s service account. And while you can, at least in theory, manage and distribute individual TLS certificates and put Helm behind a proxy to collect your own access logs, in practice you have just lost audit trail of who is doing what in your cluster.
Sensitive values
If you use charts with any sensitive values, e.g. database credentials, these
will be stored in Tiller’s state. By default, Tiller stores release information
in ConfigMaps
and all the sensitive values are stored there too. ConfigMaps
are not designed to hold sensitive data and the consequence is that these might
leak to audit logs and will not be subject to any special treatment that true
Secrets
get, such as encryption at rest. Besides, it’s another place where
your sensitive values are stored.
Loss of visibility
By introducing Helm you’re putting another abstraction layer between Kubernetes and user. Helm does not observe nor propagate actual state of resources, resulting in loss of direct visibility. Everything can look OK at the Helm layer, but resources might be in fact in a failed state.
Let’s take the following example: You deploy a chart and it succeeds. After the
initial deployment, one of its required resources is deleted and the pods won’t
run any more. Querying Helm status still shows that everything is in order, but
it is not. You have lost visibility of the actual state. Should you work
with Kubernetes resources directly (for example using kubectl
), this problem
would be immediately visible.
Terraform
In addition, there are several annoying issues with Helm provider implementation (I’ve contributed to fixing some, such as #161, but others still remain). The issue is even more obvious in combination with a configuration management tool, such as Terraform. When actual state diverges from the desired one, you would expect this to be rectified on the next run. With Helm in the middle, the tool won’t have visibility of the underlying state of Kubernetes resources, and won’t be able to converge actual state to the desired one.
This breaks one of the core concepts of Terraform’s declarative approach. And would not be an issue when working with Terraform’s Kubernetes resources directly.
This little screencast shows how the combination of Helm & Terraform results in loss of visibility of the actual state.
Charts
You have to create, manage and store charts somewhere, and this is extra work and code to maintain.
One argument is that there are many charts readily available to you, but there
are issues with these as well. – do you want to use charts owned and managed by
someone else? You should not blindly trust 3rd party charts, as this gives
anyone control over your cluster, especially in combination with constructs like
latest
tags of container images.
And even many “official” charts do in fact reference images by latest
tag.
Take this scenario – the chart or Docker image gets compromised,
and so will your cluster and data. Many charts use old images (with known
vulnerabilities) and are configured in an insecure and unreliable manner.
Definitely not production-ready. Many charts are also over-complicated – include
a lot of code and logic that’s not relevant to your environment and only adds to
the complexity. The resulting behaviour is often not fully understood by its
consumer.
Closed for extension – charts are still difficult to extend to meet your particular needs, e.g. when requiring extra labels for Istio or Prometheus, or adding manifests such as Network Policies. Usually, you’ll either end up keeping those modifications on the side (undesirable) or cloning the charts and maintaining own fork (extra work).
Versioning
You have to version charts and increase the version when making changes, as Helm won’t deploy those otherwise. Now, unrelated to the version of the application being deployed, you have an additional version to maintain for every chart… why?
|
|
This, typically in combination of storing the charts in the same repository as rest of your code, results in having to maintain a set of versions without any practical meaning,
If you decide to keep charts in a dedicated chart repository (which I believe is a bit more in the spirit of intended use), you’re up for a whole new set of challenges. Again think about the overhead of setting this up, securing and operating not just the repository, but also pipelines for building and publishing the charts. By the way, Helm only supports basic http authentication for downloading charts.
CRDs
One unsolved problem with Helm is management of CustomResourceDefinitions
(CRDs). Because Helm lacks any dependency mechanism, you can’t have both CRDs
and their instances as ordinary resources in one chart. There’s no guarantee
that CRDs will be created and ready when Helm creates the respective resources.
When using crd-install
hook, resources are not cleaned up
after the release is deleted, and also will fail with ...already exists
error
when a chart is installed again.
Helm tries to overcome this
by introducing crd-install
hook, but this never really worked for multiple
reasons. In short there’s no good way to manage CRDs with Helm, and most
of the charts end up putting these aside and managing manually (see
cert-manager
for example).
Namespaces
Another issue I’ve hit on more than one occasion is with Namespaces
. Helm
tries to manage namespaces on its own – if the namespace for deployment does
not exist, it will be created. This happens even if the resource in question is
actually a Namespace
itself, so including namespace manifest within the chart
won’t work either. Unfortunately, there is no support for creating namespaces
with custom labels or annotations.
This is especially painful in combination with Istio, which
requires istio-injection: enabled
namespace annotation to enable the
auto-injection webhook.
Benefits
So far we’ve been through quite a few downsides. So what are the benefits of using Helm?
From my experience, Helm is more often than not used as a simple templating tool, without properly versioning charts and storing them in dedicated chart repository. Helm would also typically be used as a complement to already existing configuration management tool, like Terraform, Puppet, Ansible or Salt. Each of these has robust templating mechanisms built-in.
One might argue that Helm provides more functionality via init hooks. And it does, but this does not fit cloud-native pattern very well. Usually it means that you’re trying to fix something wrong with the application in the first place on the infrastructure level. This kind of logic should be either built into the application, or, for more complex scenarios look at operators concept.
Note on Future
At the time of writing alpha.2
release of Helm 3 has just
been published, but I didn’t have time to try it out. Perhaps a topic for
another article.
It seems like in next version Helm is going to be client only, which should lift
some of the limitations. It also plans to introduce Lua support, which on the
other hand might be just even more unneeded complexity.
Summary
Let’s recap the main drawbacks we’ve talked about:
- Additional complexity
- Security
- Additional entry point to the cluster
- No authorization nor authentication
- Loss of audit trail
- Poor secrets handling
- Charts
- Lack of quality and added complexity
- Versioning, management and storing overhead
- Inability to manage CRDs
- Inability to manage Namespaces
Based on the experience of working with Helm in several production environments of different nature and size, I believe that Helm brings more pain points than it solves. The overhead of implementing and maintaining the whole ecosystem is significant, without any substantial benefits.
Kubernetes support in Terraform is by far perfect, mainly
lacking coverage and support for custom resources. Perhaps a topic for another
post, but I would recommend to check out Eric Chiang’s
K8s and Davis Ford’s
Kubernetes
YAML
providers.
I prefer to keep things simple and stick to one management tool, in my case
usually Terraform, to do the whole job. Also, kubectl
itself is a pretty
powerful and battle-tested tool.
Where I see a good fit for Helm is as a universal distribution mechanism between software vendors and consumers.
Nevertheless, that’s just my opinion and Helm might be a good fit for your use case. Hopefully this article made you aware of the major pain points and will help you choose wisely. I wish you a good voyage and would be more than interested to hear your experiences.