Terraform EKS Nodegroups with custom Launch Templates

Wang Poh Peng
4 min readNov 5, 2020

Moving on from the previous article on using custom AMIs for Amazon EKS, we start the next goal of establishing the customisation with Infrastructure as Code (IaC).

In this article, we have made the assumption that you have already created the EKS Cluster with Terraform and understood the necessary resources that are required to make the default setup work in the first place.

For a fully detailed setup please take a look at this detailed guide found here from Ravi. He has provided a comprehensive step by step guide in provisioning a EKS cluster with Terraform.

To use custom AMIs in the first place, we need to configure each EKS Nodegroup with a custom launch template. Let's get into details of what exactly happens when you create a default nodegroup with no customizations.

During the nodegroup creation:

  1. An Auto Scaling Group (ASG) gets created. This is used to make sure there are nodes created in the subnets that you have indicated.
  2. ASG attaches a generated Launch Template managed by EKS which always points the latest EKS Optimized AMI ID, the instance size field is then propagated to the launch template’s configuration.
  3. This launch template inherits the EKS Cluster’s cluster security by default and attaches this security group to each of the EC2 Worker Nodes created.

Understanding the above points are critical in implementing the custom configuration and plugging the gaps removed during customization.

If you read the article referred from the link above, there are certain caveats one must understand when using a custom launch template. In our current implementation, we take the minimum change required to make the customization work.

EKS Nodegroup Terraform Code

At line 6, you can see that instance types are commented away, in the AWS prohibitions mentioned in the above link, this instance types field must be transferred to the launch template’s configuration in order for customization to happen.

Launch Template Terraform Code

There are some configurations here we need to watch out for:

  1. block_device_mapping — This config block must follow the configuration shown above: device_name and volume_type. This config is mapped to default config that was generated by EKS Nodegroup’s default launch template.
  2. user_data — This config must be exactly set as shown, this is to make sure that during the node startup it connects to the EKS control plane. The bootstrap script is provided as part of the EKS AMI maker mentioned in the previous article
  3. vpc_security_group_ids — This is the most important config, if you leave this configuration to be blank or unset, the default EKS Cluster’s cluster security group will be assigned to all the nodes. However, if you want to use your own custom security groups, you have to include both the custom security group that you want AND the default EKS Cluster’s security group in order to allow the necessary network connections.
  4. image_id — At last, the field that you spent so much effort just to change.

Bonus

Since we want to setup the nodegroups completely, here’s the IAM permissions you need to set up with Terraform for Node Groups.

Recommendations

It would be the best to straight up use a customised launch template with customised AMIs right from the beginning of the provisioning of the EKS Cluster.

Even though we have referenced the launch template at the nodegroup configuration, what EKS does behind the scenes is to create a clone of the referenced launch template and bind it to the EKS nodegroup.

Here’s why:

Scenario 1:

The current EKS nodegroups are already using the default configuration, a manual change in launch template was detected. In order to refer to this change, one has to manually edit the ASG to point to the latest version of the launch template instead of the default version 1. Hence, this is highly discouraged, doing so introduce manual steps and the nodegroup cluster status will become degraded.

Scenario 2:

Through the use of IaC, we change the referenced launch template directly after the initial provision of a default template. This will cause a forced replacement to take place. It means that nodegroups will be terminated and a new one is then rolled out. This will result in unwanted downtime but a necessary one to switch to custom AMI.

# aws_eks_node_group.your-eks-cluster-ng must be replaced
~ launch_template {
~ id = "lt-12345" -> (known after apply)
~ name = "default" -> "your_eks_launch_template" # forces replacement
version = "1"
}

Scenario 3:

As we are using the customized AMI and launch template right from the start, we are only incrementing the version of the launch template per se even if we change the template’s configuration in its entirety. This will cause zero downtime as nodes will be spun up first, communicating with the control plane, pods drained from the old nodes and only when the migration is complete. The old nodes will be killed. This entire process can take a long time depending on the number of pods and instance size of the nodes.

# aws_eks_node_group.your-eks-cluster-ng will be updated in-place~ launch_template {
id = "lt-12345"
name = "your_eks_launch_template"
~ version = "5" -> "6"
}

From the above, we can understand the different scenarios that may happen in one’s journey or attempt to change the default AMI for the EKS Nodegroup.

You can also find out more specific details here:

Hope this article helps those who are sifting through the Internet to find resources modify their existing EKS configuration to suit their organizational needs!

--

--