Infrastructure Management with Terraform

Last week, my friend working in the field of infrastructure management gave me an overview of Infrastructure as Code (IaC).

He came across a tool called Terraform which can automate the deployment and management of cloud resources. Hence, together, we researched on ways to build a simple demo in order to demonstrate how Terraform can help in the cloud infrastructure management.

We decided to start from a simple AWS cloud architecture as demonstrated below.

As illustrated in the diagram, we have a bastion server and an admin server.

A bastion server, aka a jump host, is a server that sits between internet network of a company and the external network, such as the Internet. It is to provide an additional layer of security by limiting the number of entry points to the internet network and allowing for strict access controls and monitoring.

An admin server, on the other hand, is a server used by system admins to manage the cloud resources. Hence the admin server typically includes tools for managing cloud resources, monitoring system performance, deploying apps, and configuring security settings. It’s generally recommended to place an admin server in a private subnet to enhance security and reduce the attack surface of our cloud infrastructure.

In combination, the two servers help to ensure that the cloud infrastructure is secure, well-managed, and highly available.

Show Me the Code!

The complete source code of this project can be found at https://github.com/goh-chunlin/terraform-bastion-and-admin-servers-on-aws.

Infrastructure as Code (IaC)

As we can see in the architecture diagram above, the cloud resources are all available on AWS. We can set them up by creating the resources one by one through the AWS Console. However, doing it manually is not efficient and it is also not easy to be repeatedly done. In fact, there will be other problems arising from doing it with AWS Console manually.

  • Manual cloud resource setup leads to higher possibility of human errors and it takes longer time relatively;
  • Difficult to identify cloud resource in use;
  • Difficult to track modifications in infrastructure;
  • Burden on infrastructure setup and configuration;
  • Redundant work is inevitable for various development environments;
  • Restriction is how only the infrastructure PIC can setup the infrastructure.

A concept known as IaC is thus introduced to solve these problems.

IaC is a way to manage our infrastructure through code in configuration files instead of through manual processes. It is thus a key DevOps practice and a component of continuous delivery.

Based on the architecture diagram, the services and resources necessary for configuring with IaC can be categorised into three parts, i.e. Virtual Private Cloud (VPC), Key Pair, and Elastic Compute Cloud (EC2).

The resources necessary to be created.

There are currently many IaC tools available. The tools are categorised into two major groups, i.e. those using declarative language and those using imperative language. Terraform is one of them and it is using Hashicorp Configuration Language (HCL), a declarative language.

The workflow for infrastructure provisioning using Terraform can be summarised as shown in the following diagram.

The basic process of Terraform. (Credit: HashiCorp Developer)

We first write the HCL code. Then Terraform will verify the status of the code and apply it to the infrastructure if there is no issue in verification. Since Terraform is using a declarative language, it will do the identification of resources itself without the need of us to manually specify the dependency of resources, sometimes.

After command apply is executed successfully, we can check the applied infrastructure list through the command terraform state list. We can also check records of output variable we defined through the command terraform output.

When the command terraform apply is executed, a status information file called terraform.tfstate will be automatically created.

After understanding the basic process of Terraform, we proceed to write the HCL for different modules of the infrastructure.

Terraform

The components of a Terraform code written with the HCL are as follows.

Terraform code.

In Terraform, there are three files, i.e. main.tf, variables.tf, and outputs.tf recommended to have for a minimal module, even if they’re empty. The file main.tf should be the primary entry point. The other two files, variables.tf and outputs.tf, should contain the declarations for variables and outputs, respectively.

For variables, we have vars.tf file which defines the necessary variables and terraform.tfvars file which allocated value to the defined variables.

In the diagram above, we also see that there is a terraform block. It is to declare status info, version info, action, etc. of Terraform. For example, we use the following HCL code to set the Terraform version to use and also specify the location for storing the status info file generated by Terraform.

terraform {
  backend "s3" {
    bucket  = "my-terraform-01"
    key     = "test/terraform.tfstate"
    region  = "ap-southeast-1"
  }
  required_version = ">=1.1.3"
}

Terraform uses a state file to map real world resources to our configuration, keep track of metadata, and to improve performance for large infrastructures. The state is stored by default in a file named “terraform.tfstate”.

This is a S3 bucket we use for storing our Terraform state file.

The reason why we keep our terraform.tfstat file on the cloud, i.e. the S3 bucket, is because state is a necessary requirement for Terraform to function and thus we must make sure that it is stored in a centralised repo which cannot be easily deleted. Doing this also good for everyone in the team because they will be working with the same state so that operations will be applied to the same remote objects.

Finally, we have a provider block which declares cloud environment or provider to be created with Terraform, as shown below. Here, we will be creating our resources on AWS Singapore region.

provider "aws" {
  region = "ap-southeast-1"
}

Module 1: VPC

Firstly, in Terraform, we will have a VPC module created with resources listed below.

1.1 VPC

resource "aws_vpc" "my_simple_vpc" {
  cidr_block = "10.2.0.0/16"

  tags = {
    Name = "${var.resource_prefix}-my-vpc",
  }
}

The resource_prefix is a string to make sure all the resources created with the Terraform getting the same prefix. If your organisation has different naming rules, then feel free to change the format accordingly.

1.2 Subnets

The public subnet for the bastion server is defined as follows. The private IP of the bastion server will be in the format of 10.2.10.X. We also set the map_public_ip_on_launch to true so that instances launched into the subnet should be assigned a public IP address.

resource "aws_subnet" "public" {
  count                   = 1
  vpc_id                  = aws_vpc.my_simple_vpc.id
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  cidr_block              = "10.2.1${count.index}.0/24"
  map_public_ip_on_launch = true

  tags = tomap({
    Name = "${var.resource_prefix}-public-subnet${count.index + 1}",
  })
}

The private subnet for the bastion server is defined as follows. The admin server will then have a private IP with the format of 10.2.20.X.

resource "aws_subnet" "private" {
  count                   = 1
  vpc_id                  = aws_vpc.my_simple_vpc.id
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  cidr_block              = "10.2.2${count.index}.0/24"
  map_public_ip_on_launch = false

  tags = tomap({
    Name = "${var.resource_prefix}-private-subnet${count.index + 1}",
  })
}

The aws_availability_zones data source is part of the AWS provider and retrieves a list of availability zones based on the arguments supplied. Here, we make the public subnet and private subnet to be in the same first availability zones.

1.3 Internet Gateway

Normally, if we create an internet gateway via AWS console, for example, we will sometimes forget to associate it with the VPC. With Terraform, we can do the association in the code and thus reduce the chance of setting up the internet gateway wrongly.

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.my_simple_vpc.id

  tags = {
    Name = "${var.resource_prefix}-igw"
  }
}

1.4 NAT Gateway

Even though Terraform is a declarative language, i.e. a language describing an intended goal rather than the steps to reach that goal, we can use the depends_on meta-argument to handle hidden resource or module dependencies that Terraform cannot automatically infer.

resource "aws_nat_gateway" "nat_gateway" {
  allocation_id = aws_eip.nat.id
  subnet_id     = element(aws_subnet.public.*.id, 0)
  depends_on    = [aws_internet_gateway.igw]

  tags = {
    Name = "${var.resource_prefix}-nat-gw"
  }
}

1.5 Elastic IP (EIP)

If you have noticed, in the NAT gateway definition above, we have assigned a public IP to it using EIP. Since Terraform is declarative, the ordering of blocks is generally not significant. So we can define the EIP after the NAT gateway.

resource "aws_eip" "nat" {
  vpc        = true
  depends_on = [aws_internet_gateway.igw]

  tags = {
    Name = "${var.resource_prefix}-NAT"
  }
}

1.6 Route Tables

Finally, we just need to link the resources above with both public and private route tables, as defined below.

resource "aws_route_table" "public_route" {
  vpc_id = aws_vpc.my_simple_vpc.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.igw.id
  }

  tags = {
    Name = "${var.resource_prefix}-public-route"
  }
}

resource "aws_route_table_association" "public_route" {
  count          = 1
  subnet_id      = aws_subnet.public.*.id[count.index]
  route_table_id = aws_route_table.public_route.id
}

resource "aws_route_table" "private_route" {
  vpc_id = aws_vpc.my_simple_vpc.id
  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.nat_gateway.id
  }

  tags = {
    Name = "${var.resource_prefix}-private-route",
  }
}

resource "aws_route_table_association" "private_route" {
  count          = 1
  subnet_id      = aws_subnet.private.*.id[count.index]
  route_table_id = aws_route_table.private_route.id
}

That’s all we need to do the setup the VPC on AWS as illustrated in the diagram.

MODULE 2: Key pair

Before we move on the create the two instances, we will need to define a key pair. A key pair is a set of security credentials that we use to prove our identity when connecting to an EC2 instance. Hence, we need to ensure that we have access to the selected key pair before we launch the instances.

If we are doing this on the AWS Console, we will be seeing this part on the console as shown below.

The GUI on the AWS Console to create a new key pair.

So, we can use the same info to define the key pair.

resource "tls_private_key" "instance_key" {
  algorithm = "RSA"
}

resource "aws_key_pair" "generated_key" {
  key_name = var.keypair_name
  public_key = tls_private_key.instance_key.public_key_openssh
  depends_on = [
    tls_private_key.instance_key
  ]
}

resource "local_file" "key" {
  content = tls_private_key.instance_key.private_key_pem
  filename = "${var.keypair_name}.pem"
  file_permission ="0400"
  depends_on = [
    tls_private_key.instance_key
  ]
}

The tls_private_key is to create a PEM (and OpenSSH) formatted private key. This is not a recommended way for production because it will generate the private key file and keep it unencrypted in the directory where we run the Terraform commands. Instead, we should generate the private key file outside of Terraform and distribute it securely to the system where Terraform will be run.

MODULE 3: EC2

Once we have the key pair, we can finally move on to define how the bastion and admin servers can be created. We can define a module for the servers as follows.

resource "aws_instance" "linux_server" {
  ami                         = var.ami
  instance_type               = var.instance_type
  subnet_id                   = var.subnet_id
  associate_public_ip_address = var.is_in_public_subnet
  key_name                    = var.key_name
  vpc_security_group_ids      = [ aws_security_group.linux_server_security_group.id ]
  tags = {
    Name = var.server_name
  }
  user_data = var.user_data
}

resource "aws_security_group" "linux_server_security_group" {
  name         = var.security_group.name
  description  = var.security_group.description
  vpc_id       = var.vpc_id
 
  ingress {
    description = "SSH inbound"
    from_port = 22
    to_port = 22
    protocol = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  egress {
    description = "Allow All egress rule"
    from_port = 0
    to_port = 0
    protocol = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
 
  tags = {
    Name = var.security_group.name
  }
}

By default, AWS creates an ALLOW ALL egress rule when creating a new security group inside of a VPC. However, Terraform will remove this default rule, and require us to specifically re-create it if we desire that rule. Hence that is why we need to include the protocol=”-1″ egress block above.

The output.tf of EC2 instance module is defined as follows.

output "instance_public_ip" {
  description = "Public IP address of the EC2 instance."
  value       = aws_instance.linux_server.public_ip
}

output "instance_private_ip" {
  description = "Private IP address of the EC2 instance in the VPC."
  value       = aws_instance.linux_server.private_ip
}

With this definition, once the Terraform workflow is completed, the public IP of our bastion server and the private IP of our admin server will be displayed. We can then easily use these two IPs to connect to the servers.

Main Configuration

With all the above modules, we can finally define our AWS infrastructure using the following main.tf.

module "vpc" {
  source          = "./vpc_module"
  resource_prefix = var.resource_prefix
}
  
module "keypair" {
  source              = "./keypair_module"
  keypair_name        = "my_keypair"
}
    
module "ec2_bastion" {
  source              = "./ec2_module"
  ami                 = "ami-062550af7b9fa7d05"       # Ubuntu 20.04 LTS (HVM), SSD Volume Type
  instance_type       = "t2.micro"
  server_name         = "bastion_server"
  subnet_id           = module.vpc.public_subnet_ids[0]
  is_in_public_subnet = true
  key_name            = module.keypair.key_name
  security_group      = {
    name        = "bastion_sg"
    description = "This firewall allows SSH"
  }
  vpc_id              = module.vpc.vpc_id
}
    
module "ec2_admin" {
  source              = "./ec2_module"
  ami                 = "ami-062550af7b9fa7d05"       # Ubuntu 20.04 LTS (HVM), SSD Volume Type
  instance_type       = "t2.micro"
  server_name         = "admin_server"
  subnet_id           = module.vpc.private_subnet_ids[0]
  is_in_public_subnet = false
  key_name            = module.keypair.key_name
  security_group      = {
    name        = "admin_sg"
    description = "This firewall allows SSH"
  }
  user_data           = "${file("admin_server_init.sh")}"
  vpc_id              = module.vpc.vpc_id
  depends_on          = [module.vpc.aws_route_table_association]
}

Here, we will pre-install the AWS Command Line Interface (AWS CLI) in the admin server. Hence, we have the following script in the admin_server_init.sh file. The script will be run when the admin server is launched.

#!/bin/bash
sudo apt-get update
sudo apt-get install -y awscli

However, since the script above will be downloading AWS CLI from the Internet, we need to make sure that the routing from private network to the Internet via the NAT gateway is already done. Instead of using the depends_on meta-argument directly on the module, which will have side effect, we choose to use a recommended way, i.e. expression references.

Expression references let Terraform understand which value the reference derives from and avoid planning changes if that particular value hasn’t changed, even if other parts of the upstream object have planned changes.

Thus, I made the change accordingly with expression references. In the change, I forced the description of security group which the admin server depends on to use the the private route table association ID returned from the VPC module. Doing so will make sure that the admin server is created only after the private route table is setup properly.

With expression references, we force the admin server to be created at a later time, as compared to the bastion server.

If we don’t force the admin_server to be created after the private route table is completed, the script may fail and we can find the error logs at /var/log/cloud-init-output.log on the admin server. In addition, please remember that even though terraform apply runs just fine without any error, it does not mean user_data script is run successfully without any error as well. This is because Terraform knows nothing about the status of user_data script.

We can find the error in the log file cloud-init-output.log in the admin server.

Demo

With the Terraform files ready, now we can move on to go through the Terraform workflow using the commands.

Before we begin, besides installing Terraform, since we will deploy the infrastructure on AWS, we also shall configure the AWS CLI using the following command on the machine where we will run the Terraform commands.

aws configure

Once it is done then only we can move on to the following steps.

Firstly, we need to download plug-in necessary for the defined provider, backend, etc.

Initialising Terraform with the command terraform init.

After it is successful, there will be a message saying “Terraform has been successfully initialized!” A hidden .terraform directory, which Terraform uses to manage cached provider plugins and modules, will be automatically created.

Only after initialisation is completed, we can execute other commands, like terraform plan.

Result of executing the command terraform plan.

After running the command terraform plan, as shown in the screenshot above, we know that there are in total 17 resources to be added and two outputs, i.e. the two IPs of the two servers, will be generated.

Apply is successfully completed. All 17 resources added to AWS.

We can also run the command terraform output to get the two IPs. Meanwhile, we can also find the my_keypair.pem file which is generated by the tls_private_key we defined earlier.

The PEM file is generated by Terraform.

Now, if we check the resources, such as the two EC2 instances, on AWS Console, we should see they are all there up and running.

The bastion server and admin server are created automatically with Terraform.

Now, let’s see if we can access the admin server via the bastion server using the private key. In fact, there is no problem to access and we can also realise that the AWS CLI is already installed properly, as shown in the screenshot below.

With the success of user_data script, we can use AWS CLI on the admin server.

Deleting the Cloud Resources

To delete what we have just setup using the Terraform code, we simply run the command terraform destroy. The complete deletion of the cloud resources is done within 3 minutes. This is definitely way more efficient than doing it manually on AWS Console.

All the 17 resources have been deleted successfully and the private key file is deleted too.

Conclusion

That is all for what I had researched on with my friend.

If you are interested, feel free to checkout our Terraform source code at https://github.com/goh-chunlin/terraform-bastion-and-admin-servers-on-aws.

Getting Certified as Kubernetes App Developer: My Kaizen Journey

The high concentration of talented individuals in Samsung SDS is remarkable. I have worked alongside amazing colleagues who are not only friendly but also intelligent and dedicated to their work.

In July 2022, I had numerous discussions with my visionary and supportive seniors about the future of cloud computing. They eventually encouraged me to continue my cloud certification journey by taking the Certified Kubernetes Application Developer (CKAD) certification exam.

Before attempting the CKAD exam, I received advice on how demanding and challenging the assessment could be. Self-learning can also be daunting, particularly in a stressful work environment. However, I seized the opportunity to embark on my journey towards getting certified and committed myself to the process of kaizen, continuous improvement. It was a journey that required a lot of effort and dedication, but it was worth it.

I took the CKAD certification exam while I was working in Seoul in March 2023. The lovely weather has a soothing impact on my stress levels.

August 2022: Learning Docker Fundamentals

To embark on a successful Kubernetes learning journey, I acknowledge the significance of first mastering the fundamentals of Docker.

Docker is a tool that helps developers build, package, and run applications in a consistent way across different environments. Docker allows us to package our app and its dependencies into a Docker container, and then run it on any computer that has Docker installed.

Docker serves as the foundation for many container-based technologies, including Kubernetes. Hence, understanding Docker fundamentals provides a solid groundwork for comprehending Kubernetes.

There is a learning path on Pluralsight specially designed for app developers who are new to Docker so that they can learn more about developing apps with Docker.

I borrowed the free Pluralsight account from my friend, Marvin Heng.

The learning path helps me gain essential knowledge and skills that are directly applicable to Kubernetes. For example, it shows me how the best practices of optimising Docker images by carefully placing the Docker instructions and making use of its caching mechanism.

In the learning path, we learnt about Docker Swarm. Docker Swarm is a tool that helps us manage and orchestrate multiple Docker containers across multiple machines or servers, making it easier to deploy and scale our apps.

A simple architecture diagram of a system using Kubernetes. (Source: Pluralsight)

After getting the basic understanding of Docker Swarm, we move on to learning Kubernetes. Kubernetes is similar to Docker Swarm because they are both tools for managing and orchestrating containerised apps. However, Kubernetes has a larger and more mature ecosystem, with more third-party tools and plugins available for tasks like monitoring, logging, and service discovery.

December 2022: Attending LXF Kubernetes Course

Kubernetes is a project that was originally developed by Google, but it is now maintained by the Cloud Native Computing Foundation (CNCF), which is a sub-foundation of the Linux Foundation.

The Linux Foundation provides a neutral and collaborative environment for open-source projects like Kubernetes to thrive, and the CNCF is able to leverage this environment to build a strong community of contributors and users around Kubernetes.

In addition, the Linux Foundation offers a variety of certification exams that allow individuals to demonstrate their knowledge and skills in various areas of open-source technology. CKAD is one of them.

The CKAD exam costs USD 395.00.

The Linux Foundation also offers Kubernetes-related training courses.

The CKAD course is self-paced and can be completed online, making it accessible to learners around the world. It is designed for developers who have some experience with Kubernetes and want to deepen their knowledge and skills in preparation for the CKAD certification exam.

The CKAD course includes a combination of lectures, hands-on exercises, and quizzes to reinforce the concepts covered. It covers a wide range of topics related to Kubernetes, including:

  • Kubernetes architecture;
  • Build and design;
  • Deployment configuration;
  • App exposing;
  • App troubleshooting;
  • Security in Kubernetes;
  • Helm.
Kubectl, the command-line client used to interact with Kubernetes clusters. (Image Credit: The Linux Foundation Training)

January 2023: Going through CKAD Exercises and Killer Shell

Following approximately one month of dedicated effort, I successfully completed the online course and proudly received my course completion certificate on 7th of January 2023. So, throughout the remainder of January, I directed my attention towards exam preparation by diligently working through the various online exercises.

The initial series of exercises that I went through is the CKAD exercise thoughtfully curated by a skilled software developer, dgkanatsios, and made available on GitHub. The exercise covers the following areas:

  • Core concepts;
  • Multi-container pods;
  • Pod design;
  • Configuration;
  • Observability;
  • Services and networking;
  • State persistence;
  • Helm;
  • Custom Resource Definitions.

The exercise comprises numerous questions, therefore, my suggestion would be to devote one week to thoroughly delve into them, by allocating an hour each day to tackle a subset of the questions.

During my 10-day Chinese New Year holiday, I dedicated my time towards preparing for the exam. (Image Credit: Global Times)

Furthermore, upon purchasing the CKAD exam, we are entitled to receive two complementary simulator sessions for the exam on Killer Shell (killer.sh), both containing the same set of questions. Therefore, it is advisable to strategise and plan our approach towards making optimal utilisation of them.

After going through all the questions in the CKAD exercise mentioned above, I proceeded to undertake the initial killer.sh exam. The simulator features an interface that closely resembles the new remote desktop Exam UI, thereby providing me with invaluable insights on how the actual exam will be conducted.

The killer.sh session is allocated a total of 2 hours for the exam, encompassing a set of 22 questions. Similar to the actual exam, the session is to test our hands-on experience and practical knowledge of Kubernetes. Thus, we are expected to demonstrate our proficiency by completing a series of tasks in a given Kubernetes environment.

The simulator questions are comparatively more challenging than the actual exam. In my initial session, I was able to score only 50% out of 100%. Upon analysing and rectifying my errors, I resolved to invest an additional month’s time to study and prepare more comprehensively.

Scenario-based questions like this are expected in the CKAD exam.

February 2023: Working on Cloud Migration Project

Upon my return from the Chinese New Year holiday, to my dismay, I discovered that I had been assigned to a cloud migration project at work.

The project presented me with an exceptional chance to deploy an ASP .NET solution on Kubernetes on Google Cloud Platform, allowing me to put into practice what I have learned and thereby fortify my knowledge of Kubernetes-related topics.

Furthermore, I am lucky to have had the opportunity to engage in a fruitful discussion with my fellow colleagues, through which I was able to learn more from them about Kubernetes by presenting my work.

March 2023: The Exam

In the early of March, I was assigned to visit Samsung SDS in Seoul until the end of the month. Therefore, I decided to seize the opportunity to complete my second kill.sh simulation session. This time, I managed to score more than the passing score, which is 66%.

After that, I dedicated an extra week to reviewing the questions in the CKAD exercises on GitHub before proceeding to take the actual CKAD exam.

The actual CKAD exam consists of 16 questions that need to be completed within 2 hours. Even though the exam is online and open book, we are not allowed to refer any resources other than the Kubernetes documentation and the Helm documentaion during the exam.

In addition, the exam has been updated to use the PSI Bridge where we get access to a remote desktop instead of just a remote terminal. There is an an article about it. This should not be unfamiliar to you if you have gone through the killer.sh exams.

The new exam UI now provides us access to a full remote XFCE desktop, enabling us to run the terminal application and Firefox to open the approved online documentations, unlike the previous exam UI. Thus, having multiple monitors and bookmarking the documentation pages on our personal Internet browser before the exam are no longer helpful.

The PSI Bridgeâ„¢ (Image Credit: YouTube)

Before taking the exam, there are a lot more key points mentioned in the Candidate Handbook, the Important Instructions, and the PSI Bridge System Requirements that can help ensure success. Please make sure you have gone through them and get your machine and environment ready for the exam.

Even though I am 30-minute early to the exam, I faced a technical issue with Chrome on my laptop that caused me to be 5 minutes late for the online exam. Fortunately, my exam time was not reduced due to the delay.

The issue was related to the need to end the “remoting_host.exe” application used by Chrome Remote Desktop in order to use a specific browser for the exam. Despite trying to locate it in task manager, I was unable to do so. After searching on Google, I found a solution for Windows users. We need to execute the command “net stop chromoting” to the “remoting_host.exe”.

During my stay in Seoul, my room at Shilla Stay Seocho served as my exam location.

CKAD certification exam is an online proctored exam. This means that it can be taken remotely but monitored by a proctor via webcam and microphone to ensure the integrity of the exam. Hence, to ensure a smooth online proctored exam experience, it is crucial to verify that our webcam is capable of capturing the text on our ID, such as our passport, and that we are using a stable, high-speed Internet connection.

During the exam, the first thing I did is to create a few aliases as listed below.

alias k="kubectl "
alias kn="kubectl config set-context --current --namespace"
export dry="--dry-run=client -o yaml"
export now="--force --grace-period 0"

These aliases helped me to complete the commands quickier. In addition, if it’s possible, I also always use an imperative command to create a YAML file using kubectl.

By working on the solution based on the generated YAML file, I am able to save a significant amount of time as opposed to writing the entire YAML file from scratch.

I completed only 15 questions with 1 not answered. I chose to forgo a 9-mark question that I was not confident in answering correctly, in order to have more time to focus on other questions. In the end, I still managed to score 78% out of 100%.

The passing score for CKAD is 66% out of 100%.

Moving Forward: Beyond the Certification

In conclusion, obtaining certification in one’s chosen field can be a valuable asset for personal and professional development. In my experience, it has helped me feel more confident in my abilities and given me a sense of purpose in my career.

However, it is essential to remember that it is crucial to continue learning and growing, both through practical experience and ongoing education, in order to stay up-to-date with the latest developments in the field. The combination of certification, practical experience, and ongoing learning can help us to achieve our career goals and excel in our role as a software engineer.

Together, we learn better.

Setting up SFTP on Kubernetes for EDI

EDI or Electronic Data Interchange is a computer-to-computer exchange of business documents between companies in a standard electronic format. Even though EDI has been around since 1960s and there are alternatives such as API, EDI still remains a crucial business component in industries such as supply chain and logistics.

With EDI, there is no longer a need to manually enter data into the system or send paper documents through the mail. EDI replaces the traditional paper-based communication, which is often slow, error-prone, and inefficient.

EDI works by converting business documents into a standard electronic format that can be exchanged between different computer systems. The documents later can be transmitted securely over a network using a variety of protocols, such as SFTP. Upon receipt, the documents are automatically processed and integrated into the computer system of the recipient.

EDI and SFTP

There are many EDI software applications using SFTP as one of the methods for transmitting EDI files between two systems. This allows the EDI software to create EDI files, translate them into the appropriate format, and then transmit them securely via SFTP.

SFTP provides a secure and reliable method for transmitting EDI files between trading partners. It uses a combination of SSH (Secure Shell) encryption and SFTP protocol for secure file transfer. This helps to ensure that EDI files are transmitted securely and that sensitive business data is protected from unauthorized access.

In this blog post, we will take a look at how to setup simple SFTP on Kubernetes.

Setup SFTP Server Locally with Docker

On our local machine, we can also setup SFTP server on Docker with an image known as “atmoz/sftp”, which offers easy-to-use SFTP server with OpenSSH.

When we run it on our Docker locally, we can also mount it with our local directory with the command below.

docker run \
    --name my_sftp_server \
    -v C:\Users\gclin\Documents\atmoz-sftp:/home/user1/ftp-file-storage \
    -p 2224:22
    -d atmoz/sftp:alpine \
    user1:$6$Zax4...Me3Px/:e:::ftp-file-storage

This allows the user “user1” to login to this SFTP server with the encrypted password “$6$Zax4…Me3Px” (The “:e” means the password is encrypted. Here we are using SHA512).

The directory name at the end of the command, i.e. “ftp-file-storage”, will be created under the user’s home directory with write permission. This allows the files uploaded by the user to be stored at the “ftp-file-storage” folder. Hence, we choose to mount it to our local Documents sub-directory “atmoz-sftp”.

The OpenSSH server runs by default on port 22. Here, we are forwarding the port 22 of the container to the host port 2224.

Currently, this Docker image provides two versions, i.e. Debian and Alpine. According to the official documentation, Alpine is 10 times smaller than Debian but Debian is generally considered more stable and only bug fixes as well as security fixes are added after each Debian release which is about 2 years.

Yay, we have successfully created a SFTP server on Docker.

Moving to Kubernetes

Setting up SFTP on Kubernetes can be done using a Deployment object with a container running the SFTP server.

Firstly, we will use back the same image “atmoz/sftp” for our container. We will then create a Kubernetes Deployment object using a YAML file which defines a container specification including the SFTP server image, environment variables, and volume mounts for data storage;

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sftp-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sftp
  template:
    metadata:
      labels:
        app: sftp
    spec:
      volumes:
      - name: sftp-data
        emptyDir: {}
      containers:
      - name: sftp
        image: atmoz/sftp
        ports:
        - containerPort: 22
        env:
        - name: SFTP_USERS
          value: "user1:$6$NsJ2.N...DTb1:e:::ftp-file-storage"
        volumeMounts:
        - name: sftp-data
          mountPath: /home/user1/ftp-file-storage
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"

To keep things simple, I mounted emptyDir to the Container. The emptyDir is basically a volume first created and initially empty when a Pod is assigned to a Node, and exists as long as that Pod is running on that Node.

In addition, in order to avoid our container to starve other processes, we will add resource limits to it. In the YAML definition above, our container will be defined with a request for 0.25 CPU and 64MiB of memory. Also, the container has a limit of 0.5 CPU and 128MiB of memory.

Next, since I will be doing testing locally on my machine, I will be using the kubectl port-forward command.

kubectl port-forward deploy/sftp-deployment 22:22

After doing all these, we can now access the SFTP server on our Kubernetes cluster as shown in the following screenshot.

We can also upload files to the SFTP server on Kubernetes.

Conclusion

Yup, that’s all for having a SFTP server running on our Kubernetes easily.

After we have configured our SFTP server to use secure authentication and encryption, we can then proceed to create an SFTP user account on the server and give each of the parties who are going to communicate with us using EDI the necessary permissions to access the directories where EDI files will be stored.

Finally, we also need to monitor file transfers for errors or issues and troubleshoot as needed. This may involve checking SFTP server logs, EDI software logs, or network connections.

Overall, this is the quick start of using SFTP for EDI to ensure the secure and reliable transfer of electronic business documents between different parties.

Kubernetes CronJob to Send Email via Azure Communication Services

In March 2021, Azure Communication Services was made generally available after being showcased in Microsoft Ignite. In the beginning, it only provides services such as SMS as well as voice and video calling. One year after that, in May 2022, it also offers a way to facilitate high volume transactional emails. However, currently this email function is still in public preview. Hence, the email relevant APIs and SDKs are provided without a SLA, which is thus not recommended for production workloads.

Currently, our Azure account has a set of limitation on the number of email messages that we can send. For all the developers, email sending is limited to 10 emails per minute, 25 emails in an hour, and 100 emails in day.

Setup Azure Communication Services

To begin, we need to createa a new Email Communication Services resource from the marketplace, as shown in the screenshot below.

US is the only option for the Data Location now in Email Communication Services.

Take note that currently we can only choose United States as the Data Location, which determines where the data will be stored at rest. This cannot be changed after the resource has been created. This thus make our Azure Communication Services which we need to configure next to store the data in United States as well. We will talk about this later.

Once the Email Communication Service is created, we can begin by adding a free Azure subdomain. With the “1-click add” function, as shown in the following screenshot, Azure will automatically configures the required email authentication protocols based on the email authentication best practices.

Click “1-click add” to provision a free Azure managed domain for sending emails.

We will then have a MailFrom address in the format of donotreply@xxxx.azurecomm.net which we can use to send email. We are allowed to modify the MailFrom address and From display name to more user-friendly values.

After getting the domain, we need to connect Azure Communication Services to it to send emails.

As we talked earlier, we need to make sure that the Azure Communication Services to have United States as its Data Location as well. Otherwise, we will not be able to link the email domain for email sending.

Successfully connected our email domain. =)

A Simple Console App for Sending Email

Now, we need to create the console app which we will be used in our Kubernetes CronJob later to send the emails with the Azure Communication Services Email client library.

Before we begin, we have to get the connection string for the Azure Communication Service resource.

Getting connection string of the Azure Communication Service.

Here I have the following code to send a sample email to myself.

using Azure.Communication.Email.Models;
using Azure.Communication.Email;

string connectionString = Environment.GetEnvironmentVariable("COMMUNICATION_SERVICES_CONNECTION_STRING") ?? string.Empty;
string emailFrom = Environment.GetEnvironmentVariable("EMAIL_FROM") ?? string.Empty;

if (connectionString != string.Empty)
{
    EmailClient emailClient = new EmailClient(connectionString);

    EmailContent emailContent = new EmailContent("Welcome to Azure Communication Service Email APIs.");
    emailContent.PlainText = "This email message is sent from Azure Communication Service Email using .NET SDK.";
    List<EmailAddress> emailAddresses = new List<EmailAddress> {
            new EmailAddress("gclin009@hotmail.com") { DisplayName = "Goh Chun Lin" }
        };
    EmailRecipients emailRecipients = new EmailRecipients(emailAddresses);
    EmailMessage emailMessage = new EmailMessage(emailFrom, emailContent, emailRecipients);
    SendEmailResult emailResult = emailClient.Send(emailMessage, CancellationToken.None);
}
Setting environment variables for local debugging purpose.

Tada, there should be an email successfully sent out as instructed.

Email is successfully sent and received. =)

Containerise the Console App

Next what we need to do is containerising our console app above.

Assume that our console app is called MyConsoleApp, then we will prepare a Dockerfile as follows.

FROM mcr.microsoft.com/dotnet/runtime:6.0 AS base
WORKDIR /app

FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build
WORKDIR /src
COPY ["MyMedicalEmailSending.csproj", "."]
RUN dotnet restore "./MyConsoleApp.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "MyConsoleApp.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "MyConsoleApp.csproj" -c Release -o /app/publish /p:UseAppHost=false

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "MyConsoleApp.dll"]

We then can publish it to Docker Hub for consumption later.

If you prefer to use Azure Container Registry, you can refer to the documentation on how to do it on Microsoft Learn.

Create the CronJob

In Kubernetes, pods are the smallest deployable units of computing we can create and manage. A pod can have one or more relevant containers, with shared storage and network resources. Here, we will be scheduling a job so that it creates pods containing our container with the image we created above to operate the execution of the pods, which is in our case, to send emails.

The schedule of the cronjob is defined as follows, according to the Kubernetes documentation on the schedule syntax.

# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12)
# │ │ │ │ ┌───────────── day of the week (sun, mon, tue, wed, thu, fri, sat)
# │ │ │ │ │
# * * * * *

Hence, if we would like to have the email scheduler to be triggered at 8am of every Friday, we can create a CronJob in the namespace my-namespace with the following YAML file.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: email-scheduler
  namespace: ns-mymedical
spec:
  jobTemplate:
    metadata:
      name: email-scheduler
    spec:
      template:
        spec:
          containers:
          - image: chunlindocker/emailsender:v2023-01-25-1600
            name: email-scheduler
          restartPolicy: OnFailure
  schedule: 0 8 * * fri

After the CronJob is created, we can proceed to annotate it with the command below.

kubectl annotate cj email-scheduler jobtype=scheduler frequency=weekly

This helps us to query the cron jobs with jsonpath easily in the future. For example, we can list all cronjobs which are scheduled weekly, we can do it with the following command.

kubectl get cj -A -o=jsonpath="{range .items[?(@.metadata.annotations.jobtype)]}{.metadata.namespace},{.metadata.name},{.metadata.annotations.jobtype},{.metadata.annotations.frequency}{'\n'}{end}"

Create ConfigMap

In our email sending programme, we have two environment variables. Hence, we can create ConfigMap to store the data as key-value pair.

apiVersion: v1
kind: ConfigMap
metadata:
  name: email-sending
  namespace: my-namespace
data:
  EMAIL_FROM: DoNotReply@xxxxxx.azurecomm.net

For connection string of Azure Communication Service, since it is a sensitive data, we will store it in Secret. Secrets are similar to ConfigMaps but are specifically intended to hold confidential data. We will create a Secret with the command below.

kubectl create secret generic azure-communication-service --from-literal=CONNECTION_STRING=xxxxxx --dry-run=client --namespace=my-namespace -o yaml

It should generate a YAML which is similar to the following.

apiVersion: v1
kind: Secret
metadata:
  name: azure-communication-service
  namespace: my-namespace
data:
  CONNECTION_STRING: yyyyyyyyyy

Then, the Pods created by the CronJob can thus consume the ConfigMap and Secret above as environment variables. So, we need to update the CronJob YAML file to be as follows.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: email-scheduler
  namespace: my-namespace
spec:
  jobTemplate:
    metadata:
      name: email-scheduler
    spec:
      template:
        spec:
          containers:
          - image: chunlindocker/emailsender:v2023-01-25-1600
            name: email-scheduler
            env:
              - name: EMAIL_FROM
                valueFrom:
                  configMapKeyRef:
                    name: email-sending
                    key: EMAIL_FROM
              - name: COMMUNICATION_SERVICES_CONNECTION_STRING
                valueFrom:
                  secretKeyRef:
                    name: azure-communication-service
                    key: CONNECTION_STRING
          restartPolicy: OnFailure
  schedule: 0 8 * * fri

Using SealedSecret

Problem with using Secrets is that we can’t really commit them to our code repository because the data are only encoded but not encrypted. Hence, in order to store our Secrets safely, we need to use SealedSecret which helps us to encrypt our Secret. The SealedSecret can only be decrypted by the controller running in the targer cluster.

Currently, the SealedSecret Helm Chart is officially supported and hosted on GitHub.

Helm is the package manager for Kubernetes. Helm uses a packaging format called Chart, a collection of files describing a related set of Kubernetes resource. Each Chart comprises one or more Kubernetes manifests. With Chart, developers are able to configure, package, version, and share the apps with their dependencies and sensible defaults.

To install Helm on Windows 11 machine, we can execute the following commands in Ubuntu on Windows console.

  1. Download desired version of Helm release, for example, to download version 3.11.0:
    wget https://get.helm.sh/helm-v3.11.0-linux-amd64.tar.gz
  2. Unpack it:
    tar -zxvf helm-v3.2.0-linux-amd64.tar.gz
  3. Move the Helm binary to desired location:
    sudo mv linux-amd64/helm /usr/local/bin/helm

Once we have successfully downloaded Helm and have it ready, we can add a Chart repository. In our case, we need to add the repo of SealedSecret Helm Chart.

helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets

We should be able to locate the SealedSecret chart that we can install with the following command.

helm search repo bitnami
The Chart bitnami/sealed-secret is one of the Charts we can install.

To installed SealedSecret Helm Chart, we will use the following command.

helm install sealed-secrets -n kube-system --set-string fullnameOverride=sealed-secrets-controller sealed-secrets/sealed-secrets

Once we have done this, we should be able to locate a new service called sealed-secret-controller under Kubernetes services.

The sealed-secret-controller service is under kube-system namespace.

Before we can proceed to use kubeseal to create an encrypted secret, for me at least, there is a need to edit the sealed-secret-controller service. Otherwise, there will be an error message saying “cannot fetch certificate: no endpoints available for service”. If you also encounter the same issue, simply follow the steps mentioned by ghostsquad to edit the service YAML accordingly.

My final edit of the sealed-secret-controller service YAML.

Next, we then can proceed to encrypt our secret, as instructed on the SealedSecret GitHub readme.

kubectl create secret generic azure-communication-service --from-literal=CONNECTION_STRING=xxxxxx --dry-run=client --namespace=my-namespace -o json > mysecret-acs.json

kubeseal < mysecret-acs.json > mysealedsecret-acs.json

The generated file mysealedsecret-acs.json should look something as shown below.

The connection string is now encrypted.

To create the Secret resource, we will simply create it based on the file mysealedsecret-acs.json.

This generated file mysealedsecret-acs.json is thus safe to be committed to our code repository.

Going Zero-Trust: Using Kamus and InitContainer

Besides SealedSecret, there is also another open-source solution known as Kamus, a zero-trust secrets encryption and decryption solution for Kubernetes apps. We can also use Kamus to encrypt our secrets and make sure that the secrets can only be decrypted by the desired Kubernetes apps.

Similarly, we can also install Kamus using Helm Chart with the commands below.

helm repo add soluto https://charts.soluto.io
helm upgrade --install kamus soluto/kamus

Kamus will encrypt secrets for only a specific application represented by a ServiceAccount. A service account provides an identity for processes that run in a Pod, and maps to a ServiceAccount object. Hence, we need to create a Service Account with the YAML below.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-kamus-sa

After creating the ServiceAccount, we can update our CronJob YAML to mount it on the pods.

Next, we can proceed to download and install Kamus CLI which we can use to encrypt our secret with the following command.

kamus-cli encrypt \
  --secret xxxxxxxx \
  --service-account my-kamus-sa \
  --namespace my-namespace \
  --kamus-url <Kamus URL>

The Kamus URL could be found after we installed Kamus as shown in the screenshot below.

Kamus URL in localhost

We need to follow the instruction printed on the screen to get the Kamus URL. To do so, we need to forward local port to the pod, as shown in the following screenshot.

Successfully forward the port and thus can use the URL as the Kamus URL.

Hence, let’s say we want to encrypt a secret “alamak”, we can do so as follows.

Since our localhost Kamus URL is using HTTP, so we have to specify “–allow-insecure-url”.

After we have encrypted our secret successfully, we need to configure our pod accordingly so that it can decrypt the value with Kamus Decrypt API. The simplest way will be storing our secret in a ConfigMap because it is already encrypted, so it’s safe to store it in ConfigMap.

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-encrypted-secret
  namespace: my-namespace
data:
  data: rADEn4o8pdN8Zcw40vFS/g==:zCPnDs8AzcTwqkvuu+k8iQ==

Then we can include an InitContainer in our pod. This is because the use of an initContainer allows one or more containers to run only if one or more previous containers run and exit successfully. So we can make use of Kamus Init Container to decrypt the secret using Kamus Decryptor API and output it to a file to be consumed by our app. There is an official demo from the Kamus Team on how to do that on the GitHub. Please take note that one of their YAML files is outdated and thus there is a need to update their deployment.yaml to use “apiVersion: apps/v1” with a proper selector.

Updated deployment.yaml.

After the deployment is successful, we can forward the port 8081 to the pod in the deployment as shown below.

kubectl port-forward deployment/kamus-example 8081:80

If the deployment is successful, we should be able to see the following when we visit localhost:8081 on our Internet browser, as shown in the following screenshot.

Yay, the original text “alamak” is successfully decrypted and displayed.

Deploy Our CronJob

Now, since we have everything setup, we can create our Kubernetes CronJob with the YAML file we have earlier. For local testing, I have edited the schedule to be “*/2 * * * *”. This means that an email will be sent to me every 2 minutes.

After waiting for a couple of minutes, I have received a few emails sent via the Azure Communication Services, as shown below.

Now the emails are received every 2 minutes. =)

Hoorey, this is how we build a simple Kubernetes CronJob and how we can send emails with the Azure Email Communication Services.