Infrastructure as Code (IaC) is at the heart of modern SRE and DevOps practices. As an SRE, using Terraform empowers you to provision, manage, and scale infrastructure predictably across cloud environments. In this guide, I will walk you through the key reasons and examples that showcase why and how I use Terraform, particularly for multi-region and multi-cloud deployments, provisioning automation, and secure secret management.
Why Use Terraform?
Terraform is a declarative IaC tool that helps you manage infrastructure consistently, reliably, and at scale. Here are a few key benefits:
- Cloud Agnostic: Supports AWS, Azure, GCP, and many others.
- Multi-region Deployment: Define and manage resources across regions from a single codebase.
- Modular and Reusable: Write once, reuse with variables and modules.
- Version Controlled: Keep infrastructure definitions in Git to track changes and enable collaboration.
- Automation-Ready: Integrates seamlessly with CI/CD pipelines and tools like GitHub Actions.
Multi-Region and Multi-Cloud Deployments
In a globally distributed system, SREs often need to deploy services across multiple regions or even different cloud providers. Terraform makes this easy by allowing the use of multiple provider blocks. This approach enables:
- High availability and disaster recovery
- Latency optimization by placing resources closer to users
- Vendor independence and resiliency across cloud providers
This addresses typical SRE goals such as minimizing downtime, reducing failure domains, and simplifying cross-cloud scaling. Using Terraform, you can define multiple provider blocks with aliases to manage different regions or cloud providers:
Multi-Region Example (AWS)
provider "aws" {
alias = "us-east-1"
region = "us-east-1"
}
provider "aws" {
alias = "us-west-2"
region = "us-west-2"
}
resource "aws_instance" "east" {
ami = "ami-0123456789abcdef0"
instance_type = "t2.micro"
provider = aws.us-east-1
}
resource "aws_instance" "west" {
ami = "ami-0123456789abcdef0"
instance_type = "t2.micro"
provider = aws.us-west-2
}
Multi-Cloud Example (AWS + Azure)
provider "aws" {
region = "us-east-1"
}
provider "azurerm" {
features = {}
subscription_id = "<subscription_id>"
client_id = "<client_id>"
client_secret = "<client_secret>"
tenant_id = "<tenant_id>"
}
resource "aws_instance" "example" {
ami = "ami-0123456789abcdef0"
instance_type = "t2.micro"
}
resource "azurerm_virtual_machine" "example" {
name = "example-vm"
location = "eastus"
size = "Standard_A1"
# other required fields
}
Terraform Variables: Inputs, Outputs, and Tfvars
Variables in Terraform promote reusability, parameterization, and separation of configuration from logic. Instead of hardcoding values like instance types, region names, or AMI IDs, we can define them as variables and supply different values per environment, workspace, or CI/CD pipeline.
This makes it easier to:
- Deploy to multiple environments (e.g., dev, stage, prod)
- Reduce code duplication
- Handle sensitive values securely
- Support team collaboration and maintain clean, scalable infrastructure definitions
Input Variables
Variables can be defined in its own (e.g. variables.tf
) or directly in your main.tf
, these are the parameters your module or configuration expects.
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t2.micro"
}
resource "aws_instance" "example_instance" {
ami = var.ami_id
instance_type = var.instance_type
}
You can override input variable values in several ways:
- From CLI:
terraform apply -var="instance_type=t3.micro"
- From a
.tfvars
file - From environment variables prefixed with
TF_VAR_
Output Variables
Output variables expose computed information after a Terraform run. These are useful for referencing values in other modules or for displaying information. For example, following output variable prints out the public ip address after an EC2 instance is created.
output "public_ip" {
description = "Public IP address of the EC2 instance"
value = aws_instance.example_instance.public_ip
}
Tfvars File
The default terraform.tfvars
file helps organize values for input variables by supplying their values:
cidr = "10.0.0.0/16"
instance_id = "ami-014e30c8a36252ae5"
instance_type = "t2.micro"
bucket_name = "terraform-s3-bucket"
region = "us-west-1"
availability_zones = ["us-west-1a", "us-west-1b"]
To use a different tfvars file during resources creation:
terraform apply -var-file=dev.tfvars
Using Modules for Reusability
Terraform modules are reusable containers for multiple resources used together. They help you:
- Encapsulate logic for deploying standardized components (e.g., EC2 instance, VPC)
- Promote DRY principles by avoiding duplicated code
- Standardize deployments across teams and environments
In SRE practice, this supports automation, consistency, and faster onboarding, while reducing the risk of human error when provisioning infrastructure repeatedly.
For example, in this \module\ec2_instance\main.tf, these blocks defines a module that creates an EC2 instance:
variable "ami_value" {
description = "AMI ID for the EC2 instance"
type = string
}
variable "instance_type_value" {
description = "Type of the EC2 instance"
type = string
}
resource "aws_instance" "example" {
ami = var.ami_value
instance_type = var.instance_type_value
}
In a multi-user environment, when any teams want to creates an EC2 instance, they simply write a main.tf that pass the desired values to the modules (i.e. ami_value & instance_type_value) to the module to create the resources they need, thus speed up the time required to provision their infrastructure and reduce human error.
provider "aws" {
region = "us-west-1"
}
module "ec2_instance" {
source = "./module/ec2_instance"
ami_value = "ami-014e30c8a36252ae5"
instance_type_value = "t2.micro"
}
Terraform State Management and Remote Backend
Terraform uses a state file to record the current state of your infrastructure. This file is essential because Terraform relies on it to determine what actions are required to bring the infrastructure in sync with your code. It keeps track of created resources, attributes, and dependencies.
By default, this state file is stored locally on your machine, which works fine for solo projects — but becomes problematic in multi-user or team environments.
Drawbacks of Local State in Team Settings
- No shared visibility — Only the person with the local file knows the current state.
- High risk of overwrites — Two users applying changes simultaneously may corrupt or overwrite each other’s work.
- No locking — There’s no mechanism to prevent multiple Terraform runs from happening at once.
- Configuration drift — Without a consistent, central state, environments can fall out of sync.
- Security risks — Local state may contain sensitive data (e.g. passwords, tokens) and is vulnerable to leaks if not properly secured.
Remote Backend: The Solution
To solve these issues, Terraform supports remote backends — storage systems where the state file is saved and accessed centrally. Popular options include:
- AWS S3 (with DynamoDB for locking)
- Terraform Cloud
- Azure Blob Storage
Using a remote backend:
- Provides a central source of truth
- Enables team collaboration
- Supports state locking and history tracking
- Helps enforce CI/CD workflows safely
What is Locking and Why Use DynamoDB?
When multiple people or automation pipelines interact with the same Terraform state, locking prevents conflicts. It ensures that only one process can update the state at a time.
Amazon DynamoDB is often used with S3 backends to manage this lock:
- Atomic operations to prevent race conditions
- Blocking and waiting if a lock is already held
- Scalable and highly available — no single point of failure
In SRE practices, this setup ensures reliable and concurrent-safe infrastructure changes — even with multiple developers or automation workflows.
Remote Backend with Locking Example
terraform {
backend "s3" {
bucket = "wallace-s3-terraform-state-files"
key = "wallace/terraform.tfstate"
region = "us-west-1"
dynamodb_table = "terraform-lock" # Enables locking
}
}
This blocks tells Terraform to use AWS S3 bucket to maintain the state file and lock file centrally in DynamoDB.
Following is a sample block to provision a DynamoDB for maintain the lock file using Terraform:
resource "aws_dynamodb_table" "terraform_lock" {
name = "terraform-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
Terraform Provisioners
Provisioners in Terraform allow you to run scripts or commands after a resource is created. This is useful for bootstrapping, installing dependencies, or configuring systems on first boot.
In an SRE context, provisioners enable:
- Rapid test environment setup for integration/QA
- On-demand debugging or patching in dynamic environments
- Simplified deployment pipelines for quick validation of infrastructure-as-code
Use them cautiously — if a provisioner fails, Terraform may mark the resource as tainted.
Full Provisioner-Based Infrastructure Example
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "6.8.0"
}
}
}
provider "aws" {
region = "us-west-1"
}
variable "cidr" {
default = "10.0.0.0/16"
}
resource "aws_key_pair" "example" {
key_name = "terraform-demo-wallace"
public_key = file("~/.ssh/id_rsa.pub")
}
resource "aws_vpc" "main" {
cidr_block = var.cidr
}
resource "aws_subnet" "sub1" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.0.0/24"
availability_zone = "us-west-1a"
map_public_ip_on_launch = true
}
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.main.id
}
resource "aws_route_table" "main" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.igw.id
}
}
resource "aws_route_table_association" "rta1" {
subnet_id = aws_subnet.sub1.id
route_table_id = aws_route_table.main.id
}
resource "aws_security_group" "web_sg" {
name = "web"
vpc_id = aws_vpc.main.id
ingress {
description = "HTTP from VPC"
from_port = 8000
to_port = 8000
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "SSH"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "Allow all outbound traffic"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_instance" "web" {
ami = "ami-014e30c8a36252ae5"
instance_type = "t2.micro"
key_name = aws_key_pair.example.key_name
subnet_id = aws_subnet.sub1.id
vpc_security_group_ids = [aws_security_group.web_sg.id]
associate_public_ip_address = true
tags = {
Name = "WebServer"
}
connection {
type = "ssh"
user = "ubuntu"
private_key = file("~/.ssh/id_rsa")
host = self.public_ip
}
provisioner "file" {
source = "app.py"
destination = "/home/ubuntu/app.py"
}
provisioner "remote-exec" {
inline = [
"echo 'Hello from the remote instance'",
"sudo apt update -y",
"sudo apt-get install -y python3-venv",
"cd /home/ubuntu",
"python3 -m venv appenv",
"/home/ubuntu/appenv/bin/pip install --upgrade pip",
"/home/ubuntu/appenv/bin/pip install flask",
"chmod +x /home/ubuntu/app.py",
"/home/ubuntu/appenv/bin/python /home/ubuntu/app.py"
]
}
}
This example provisions an entire testing environment automatically:
- Creates a VPC, public subnet, internet gateway, and route table
- Sets up a security group that allows SSH and app traffic (port 8000)
- Provisions an EC2 instance using a key pair and public subnet
- Uses
file
provisioner to copy a localapp.py
script to the instance - Uses
remote-exec
provisioner to install dependencies and start a Flask app
This automation enables rapid deployment of test or staging environments during development or CI/CD pipeline runs — a core goal of modern SRE teams.
GitHub Actions Integration Example
In a DevOps workflow, it’s common to automate infrastructure provisioning when code changes occur. Here’s how you can use GitHub Actions to automatically trigger Terraform when app.py
is updated:
Create a GitHub Actions workflow in .github/workflows/deploy.yml
:
name: Auto Provision EC2 on App Change
on:
push:
paths:
- app.py
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.5.0
- name: Terraform Init
run: terraform init
- name: Terraform Plan
run: terraform plan -out=tfplan
- name: Terraform Apply
run: terraform apply -auto-approve tfplan
- name: Cleanup Plan File
run: rm tfplan
This pipeline listens for changes to app.py
and automatically runs terraform init
, plan
, and apply
. It ensures that any changes to the application are immediately provisioned onto the EC2 instance via the configured provisioners.
Workspaces for Multi-Environment Support
Workspaces in Terraform allow you to maintain isolated state files for different environments (like dev, staging, prod) using the same configuration.
This solves several problems in SRE practice:
- Avoids overwriting infrastructure across environments
- Enables safe parallel deployments
- Simplifies CI/CD automation with clearly separated states
Workspaces allow you to isolate environments (dev/stage/prod) with separate state files.
Workspace Commands
terraform workspace new dev
terraform workspace select dev
terraform workspace show
This example creates AWS EC2 instance with different instance type according to the workspace by using map() and lookup() functions:
provider "aws" {
region = "us-west-1"
}
variable "ami_value" {
description = "AMI ID for the EC2 instance"
type = string
}
variable "instance_type_value" {
description = "Type of the EC2 instance"
type = map(string) #use map to allow different instance types for different environments
default = {
"dev" = "t2.micro"
"stage" = "t2.small"
"prod" = "t2.large"
}
}
module "ec2_instance" {
source = "./modules/ec2_instance"
ami_value = var.ami_value
instance_type_value = lookup(var.instance_type_value, terraform.workspace, "t2.micro") # Use lookup to get the instance type based on the workspace
}
By using different workspaces for dev/stage/prod environments, state file (terraform.tfstate) are maintain separately as seen in the following folder structure:
wallacelee@imac % tree
.
├── main.tf
├── modules
│ └── ec2_instance
│ └── main.tf
├── stage.tfvars
├── terraform.tfstate.d
│ ├── dev
│ │ ├── terraform.tfstate
│ │ └── terraform.tfstate.backup
│ ├── prod
│ │ ├── terraform.tfstate
│ │ └── terraform.tfstate.backup
│ └── stage
│ ├── terraform.tfstate
│ └── terraform.tfstate.backup
└── terraform.tfvars
7 directories, 10 files
Secret Management with Vault + Terraform
HashiCorp Vault provides a secure way to store and access secrets. Terraform can authenticate to Vault using AppRole and retrieve secrets like API keys, passwords, or bucket names at runtime.
This addresses several critical SRE concerns:
- Avoid hardcoding sensitive data in Terraform code or state files
- Centralized secret lifecycle management
- Enforce access control and auditability for secrets usage
Integrating Vault with Terraform boosts your infrastructure security posture while maintaining automation.
Use Case: Securely retrieve a secret value (e.g., S3 bucket name) from Vault and use it in a resource:
resource "aws_s3_bucket" "example" {
bucket = data.vault_kv_secret_v2.example.data["s3-bucket-name"]
}
How to Setup Hashicorp Vault:
- Install Vault on an EC2 instance.
sudo apt update && sudo apt install gpg
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
gpg --no-default-keyring --keyring /usr/share/keyrings/hashicorp-archive-keyring.gpg --fingerprint
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update
sudo apt install vault
#start the vault in ec2 instance:
vault server -dev -dev-listen-address="0.0.0.0:8200"
- Enable KV (key-pair value) secrets engine:
vault secrets enable -path=kv kv-v2
- Create a secret:
vault kv put kv/myapp s3-bucket-name=wallace-prod-bucket
- Create a policy and role:
vault policy write terraform - <<EOF
path "kv/data/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
EOF
vault write auth/approle/role/terraform \
secret_id_ttl=10m \
token_num_uses=10 \
token_ttl=20m \
token_max_ttl=30m \
secret_id_num_uses=40 \
token_policies=terraform
- Get
role_id
andsecret_id
:
vault read auth/approle/role/terraform/role-id
vault write -f auth/approle/role/terraform/secret-id
The following example creates a S3 bucket with bucket name securely retrieved from the secret key s3-bucket-name.
provider "aws" {
region = "us-west-1"
}
provider "vault" {
address = "http://x.x.x.x:8200"
skip_child_token = true
auth_login {
path = "auth/approle/login" # Use the AppRole auth method
parameters = {
role_id = "42fb0f72-2c2a-abb9-7b8e-b2f73ac75e83"
secret_id = "ba45c19c-206e-c201-f454-83f772ba5f40"
}
}
}
data "vault_kv_secret_v2" "example" {
mount = "kv" # Change it according to your mount
name = "test-secret" #name of the secret in Vault
}
resource "aws_s3_bucket" "example" {
bucket = data.vault_kv_secret_v2.example.data["s3-bucket-name"]}
Final Thoughts
Terraform is a must-have tool in any SRE or DevOps engineer’s toolkit. Whether you’re managing complex multi-cloud infrastructure, isolating environments with workspaces, or automating test environments with provisioners, Terraform brings structure, safety, and scalability to your infrastructure operations.
You can find all my Terraform sample scripts here. Thank you for reading my blog and I hope you like it.