Experience feedback: Deploying and managing a Kubernetes infrastructure on-premise

2021-12-14 2121 words 10 minutes

Contents

Blog post also available (in French) on Devoteam Revolve Blog

As part of a mission for a client, we supported the team in charge of developing data enrichment services. In this context where applications are deployed and managed by an operations team on virtual machines, the client had two main needs. To have this team adopt the DevOps methodology in order to gain agility, and to migrate to a resilient, automated architecture with the ability to scale. After a first phase during which we studied the environment and specified the client’s needs, we chose to start with an on-premise Kubernetes-based infrastructure.

Motivations for the on-premise Kubernetes solution

Before deciding to choose Kubernetes, we first studied this solution. What needs does it meet and what constraints does it pose in an on-premise environment?

Kubernetes is first of all an open source platform initiated by Google, which is an orchestrator of containerized applications and which has subsequently become a cloud-native platform. The community is very active, and the frequency of release of a version is of the order of a few months. It provides a set of tools for automation and scaling. A Kubernetes cluster can contain hosts that can be on on-premises, public, private or hybrid cloud environments.

The set of automation mechanisms provided by Kubernetes, if well used in accordance with the DevOps approach, allow to take advantage of the benefits of this paradigm.

The main benefits that meet the customer’s needs are the following:

Accelerate the transition between development and production release of new features on enrichment services (reduce the famous time to market), thanks in particular to the ability to integrate deployment via CI/CD pipelines
Increase responsiveness with the ability to scale services using the HPA (horizontal pod autoscaler) mechanism, facilitate troubleshooting phases thanks to the implementation of an observability stack
Guarantee the resilience of services through various mechanisms such as self-healing, service discovery, liveness and readiness to name a few.

For a few years now, Kubernetes has become a reference in the containerized orchestration solutions market. To get a better idea of this, you only need to look at the major cloud providers. Elastic Kubernetes Service EKS at AWS, Azure Kubernetes Service AKS at Microsoft Azure and Google Kubernetes Engine GKE at Google, all of them have ended up developing a solution tailored to their ecosystem. The skills in the market are also following this trend. The customer also needs to ensure that they can easily find competent profiles on this solution to ensure its future maintenance.

Deploying and managing a Kubernetes cluster on an on-premises environment is different from using a managed service from a cloud provider. For example, with Amazon Elastic Kubernetes Service (EKS), deployment is quick and easy and the customer does not have control over the nodes and related components that make up the control plane. They are the responsibility of the AWS cloud provider.

On the contrary, in an on-premise environment, the administrators manage the installation of the cluster from scratch and have a free hand on the control plane. They are therefore responsible for the proper management of the components. As a reminder, some essential components for the proper functioning of a Kubernetes cluster:

Etcd: Key/value base that maintains state data on the cluster, it is generally deployed in cluster mode on the control plane nodes.
Kube-scheduler : Component whose role is to assign pods to nodes of the data plane based on data contained in the Etcd.
Kube-controller-manager : Daemon that continuously observes the state of the cluster, it is a component at the heart of the famous control loop mechanism.
Kube API-server : Server exposing a REST API, validating and applying changes on cluster resources.

We also studied the issue of disk space on each node. When we deploy a pod, we download and cache the associated container image. It is therefore important to ensure that the cache does not take up too much storage space, otherwise there is a risk of the cluster malfunctioning. For some applications, there may be a need to ensure data persistence, for example a MySQL database deployed as a pod. The management of distributed storage for this type of application also had to be provided for in the target architecture.

On the question of installation, several solutions exist to automate this phase. The automation of this phase has two major advantages:

Drastically reduce the time needed to reinstall a cluster in the event of a major breakdown in the infrastructure.
To be able to deploy a new cluster very quickly, ready for use, if the need arises (and it did!).

Installation automation, the Kubespray choice

The installation of a Kubernetes cluster is considered as a tedious task by operators and DevOps, that’s why and as previously mentioned we sought to automate this task as much as possible. There are currently several solutions to deploy Kubernetes clusters on an on-premise environment, below are the solutions we have studied in particular:

Kubeadm: Official Kubernetes Installer
RKE (Rancher Kubernetes Engine): Installer developed by Rancher, written in Golang
Kubespray : A tool combining both Kubeadm and Ansible After study, we chose to go with Kubespray because this tool met several of our needs.

First, it is an open-source tool that is maintained by a very active community. It allows us to easily manage Kubernetes version upgrades. The disadvantage is that we now depend on the development cycle of the tool to upgrade to newer versions.

It is compatible with a Linux distribution provided by the customer. The tool also offers a good modularity on the type of Kubernetes architecture you want to deploy.

Finally, it uses a combination of Kubeadm and the configuration management tool Ansible, on which the customer’s Ops team has expertise.

Upstream of installing a dedicated Kubernetes cluster for the client’s enrichment services test and certification environments, we did some prep work on the VMs that would make it up.

First, we had to determine the number of VMs and their respective characteristics to dedicate to the cluster. We started with eight machines.

For the control plane, which, as mentioned earlier, contains components that are essential to the smooth running of the cluster, we chose to dedicate three VMs to guarantee a minimum of high availability.

For the data plane, two VMs with more memory resources were provisioned at our request because the services to be hosted are particularly demanding.

The fifth provisioned machine plays the role of a bastion host. It allows to run Ansible playbooks to preconfigure the future nodes of the cluster and then to run Kubespray. The last two are dedicated to load balancing the Kubernetes API servers present on the three nodes making up the control plane, with the prior installation of HAProxy.

For the first five machines making up the cluster, we automated their pre-configuration with Ansible, including:

Copy of the public key of a user created at the time authorizing the SSH connection between the bastion host and the nodes and thus the proper execution of the Ansible playbooks from there on.
Mounting of an additional 100 Gb volume to store the /var/lib/docker path to save the container images.
Installation of various packages necessary for the proper execution of Kubespray
Opening of network ports

Once this was complete, we made adjustments to how Kubespray would deploy the cluster on the VMs. A major choice for the final architecture was to dedicate the load balancing on HAProxy servers installed on VMs ordered on occasion.

Cluster high availability with HAProxy and KeepAlived

HAProxy is a well-known open source software tool for load balancing. We use it here to distribute the load both on the Kubernetes API (hosted on the control plane nodes) and on the services deployed on the cluster (hosted on the data plane nodes).

In order to avoid that the VM on which HAProxy is installed is a single point of failure (SPOF), we have chosen after discussion with the customer to have a second instance of HAProxy. The goal is to have a backup instance and thus benefit from high availability on the load balancing service. This choice brings with it two constraints: how to ensure that the configuration of the two HAProxy instances remains consistent on both instances? And how to switch transparently to the backup instance if the first one goes down?

Configuration synchronization

For the first issue, we looked for a way to synchronize the configuration on each machine. It is defined by default, when HAProxy is installed, in a haproxy.cfg file. The solution we found was to create a mount point via NFS on the bastion host. This allows us to have one and the same configuration shared between the two servers.

For the second, we used a FailOver mechanism. We chose the KeepAlived service for this. This solution allows us to associate a virtual IP, i.e. an address associated to no server, to several IPs this time associated. In our case, the configuration is quite simple with only two servers.

We define mainly three elements in the configuration:

A virtual IP address associated to the two HAProxy instances
The IP address of the main instance
And the IP address of the backup instance

To summarize, to join our cluster (whether to request the Kubernetes API or the client’s exposed services) we will only use this virtual address. If the main HAProxy instance fails, the virtual ip will be mounted on the backup instance until the main instance is back up. The recovery will then be completely transparent.

Moreover, it is not usual to use an IP address to request a service. It is generally preferable to use a host name. To do this, we created a record on the client’s DNS server specific to the cluster in order to resolve all hostnames associated with the hosted services. Finally, the hostnames of these services will be resolved by the virtual address associated with our HAProxy instances.

A last feature we implemented to secure the exchanges with the cluster services from the outside was the activation of TLS (Transport Layer Security). To do this we first had to generate a certificate validated by the client’s internal PKI (Public Key Infrastructure). We configured a wildcard certificate so that it validates the DNS record associated with the cluster. In a second step, we just had to configure the HAProxy to allow HTTPS connections by associating this certificate.

To finish, here is a global diagram of this infrastructure:

Persistent storage management for pods, with nfs-provisionner

When deploying applications on a Kubernetes cluster, by default the data are stored locallyi on the node that hosts the pod. The lifetime of the data is ephemeral, meaning that as soon as the pod disappears, so does its data. However, there may be cases where you want the data to persist.

To meet this need, Kubernetes provides a set of resources to maintain the persistence of this data: Persistent Volume (PV), Persistent Volume Claim (PVC) and Storage Class. To put it simply, a storage class allows you to describe how you will provision a persistent volume. A persistent volume claim is a resource that is linked to a namespace (virtual cluster) and defines how a persistent volume will be consumed.

A persistent volume claim is a resource that, linked to a namespace, allows a persistent volume to be consumed under several conditions. If we want a pod to have access to a persistent volume, then we associate the corresponding persistent volume claim in its namespace.

This need for persistence materialized when the customer expressed his need to have a key/value configuration storage distributed within the Kubernetes cluster. Indeed, the services to be migrated to the cluster are natively coded to retrieve their configurations from a Consul server. Deploying a Consul cluster on Kubernetes therefore has as a pre-requisite to configure the persistence of its key/value storage.

To keep consistency with what we had previously set up, we wanted our PVs to be hosted on the bastion host, and thus connected to the cluster via NFS. So we tried to create a storage class allowing to provision/deprovision PVs dynamically via NFS. We finally found a solution named nfs-provisioner. Once configured, as soon as a pod has a persistence need, we specify a storage class nfs-provisionner. Thus at its initialization, a PV and a PVC are generated.

Conclusion

To conclude, we can see that the installation of a Kubernetes cluster on an on-premise environment hides a multitude of issues that it is important to raise as much as possible upstream. To do this, a study phase of the customer’s ecosystem is necessary to identify the main constraints. They can be diverse, here we have identified some related to the network, storage, systems etc..