Terraform - Tips and Tricks while using with AWS

Having worked quite extensively with AWS CloudFormation, I was quite apprehensive about using Terraform. Mainly because I guess I had to move out of my comfort zone! Nonetheless, I went on this journey to start using Terraform and learn as I went ahead with it. Needless to say I did end up discovering the pros and the cons of the terraform provisioning environment.

Now after spending a few months using Terraform I think its a good time to share my learning from the experience.

REMEMBER:

Do not assume anything!

No major version released yet

Lets start with the good things

  1. terraform plan gives the ability to see what changes are going to be applied and more importantly how the existing resources would be affected. This is almost like a pre-deployment QA on the live setup. Similar to CloudFormation Change Sets.
  2. The create_before_destroy flag can be set for pretty much all resources and terraform honours it even if AWS doesn’t provide such an an attribute natively for the resources.
  3. Terraform gives the ability to take existing resources created by some other means and bring it under the Terraform plan.
  4. Terraform syntax is very easy to use.
  5. Terraform supports a large number of providers which makes it easy to use one tools across the board rather than a bunch.

Moving on to the outliers

  1. Terraform stores the state in a file with the .tfstate extension. Basically if the file gets accidentally deleted or the file gets corrupted, the stack state is lost! This can be regenerated by either running the terraform plan again or by import each and every resource. However, it should be noted that

    Each resource in Terraform must implement some basic logic to become importable. As a result, not all Terraform resources are currently importable. Source

  2. State Locking is a big concern with Terraform since the state is stored physically on disk. Each terraform plan need to specify the backend (read state) that it needs to initialize before provisioning or updating managed infrastructure. If two terraform run access the same state file and are triggered at the same time then they’ll end up accessing and altering the same state. Potentially corrupting the state file.

  3. Specific to autoscaling groups and launch configurations, terraform doesn’t associate an updated launch configuration with an existing autoscaling group.

  4. If a terraform plan fails mid way through a run then it doesn’t rollback to the earlier stable version. Rather the stack might end up in a state of limbo where certain resources that should have been replaced during the run end up getting deleted.

  5. Unlike CloudFormation, the resources within a terraform plan are not automatically tagged with the stack id. All tags need to be configured in the terraform module.

But wait! There are workarounds

  1. There is nothing that can be done around state corruption sadly. But state storage can be fixed. Ideally the state file should be kept remote and version controlled. GitHub may not be the most ideal location. A secured S3 bucket would be a better option.

  2. To address state locking, terraform website provides a decent solution using S3 and DynamoDB. But the configuration of the backend key to be unique still is a manual step.

    terraform {
        backend "s3" {
            role_arn = "arn:aws:iam::XXXXXXXX:role/terraform-state"
            region = "ap-southeast-1"
            lock_table = "terraform"
            bucket = "bucket-name"
            key = "make-sure-this-is-unique-per-terraform-plan-per-environment"
        }
    }
    
        provider "aws" {
            region = "ap-southeast-1"
            allowed_account_ids = ["XXXXXXXX"]
    }
    
  3. In order to fix the autoscaling group and launch configuration syncing issue. The ideal way would be to have the autoscaling group and the launch configuration named the same.

    resource "aws_launch_configuration" "LC" {
        name_prefix   = "${var.stack_name}-"
        ...
        ...
        ...
    }
    
    resource "aws_autoscaling_group" "ASG" {
        name                      = "${aws_launch_configuration.LC.name}"
        launch_configuration      = "${aws_launch_configuration.LC.name}"
        ...
        ...
        ...
    }
    
  4. To avoid resources being killed in case of a failed terraform run, make sure the create_before_destroy flag is set to true. However, this flag is only available for most resources, not all.

These are a few learning from the terraform ecosystem. Enough to get started with at least!