Kubernetes – cuteprogramming

A Kubernetes Lab for Massively Parallel .NET Parameter Sweeps

November 23, 2025November 24, 2025 by Chun Lin, posted in C#, Discrete Event Simulation, Event, Experience, Kubernetes, Product

Let’s start with a problem that many of us in the systems engineering world have faced. You have a computationally intensive application such as a financial model, a scientific process, or in my case, a Discrete Event Simulation (DES). The code is correct, but it is slow.

In some DES problems, to get a statistically reliable answer, you cannot just run it once. You need to run it 5,000 times with different inputs, which is a massive parameter sweep combined with a Monte Carlo experiment to average out the randomness.

If you run this on your developer machine, it will finish in 2026. If you rent a single massive VM on cloud, you are burning money while one CPU core works and the others idle.

This is a brute-force computation problem. How do you solve it without rewriting your entire app? You build a simulation lab on Kubernetes. Here is the blueprint.

About Time

My specific app is a DES built with a C# library called SNA. In DES, the integrity of the entire system depends on a single, unified virtual clock and a centralised Future Event List (FEL). The core promise of the simulation engine is to process events one by one, in strict chronological order.

The FEL is a core component of a DES, which manages and schedules all future events that will occur in the simulation.

This creates an architectural barrier. You cannot simply chop a single simulation into pieces and run them on different pods on Kubernetes. Each pod has its own system clock, and network latency would destroy the causal chain of events. A single simulation run is, by its nature, an inherently single-threaded process.

We cannot parallelise the simulation, but we can parallelise the experiment.

This is what is known as an Embarrassingly Parallel problem. Since the multiple simulation runs do not need to talk to each other, we do not need a complex distributed system. We need an army of independent workers.

The Blueprint: The Simulation Lab

To solve this, I moved away from the idea of a “server” and toward the idea of a “lab”.

Our architecture has three components:

The Engine: A containerised .NET app that can run one full simulation and write its results as structured logs;
The Orchestrator: A system to manage the parameter sweep, scheduling thousands of simulation pods and ensuring they all run with unique inputs;
The Observatory: A centralised place to collect and analyse the structured results from the entire army of pods.

The Engine: Headless .NET

The foundation is a .NET console programme.

We use System.CommandLine to create a strict contract between the container and the orchestrator. We expose key variables of the simulation as CLI arguments, for example, arrival rates, resource counts, service times.

using System.CommandLine;

var rootCommand = new RootCommand
{
    Description = "Discrete Event Simulation Demo CLI\n\n" +
                  "Use 'demo <subcommand> --help' to view options for a specific demo.\n\n" +
                  "Examples:\n" +
                  "  dotnet DemoApp.dll demo simple-generator\n" +
                  "  dotnet DemoApp.dll demo mmck --servers 3 --capacity 10 --arrival-secs 2.5"
};

// Show help when run with no arguments
if (args.Length == 0)
{
    Console.WriteLine("No command provided. Showing help:\n");
    rootCommand.Invoke("-h"); // Show help
    return 1;
}

// ---- Demo: simple-server ----
var meanArrivalSecondsOption = new Option<double>(
    name: "--arrival-secs",
    description: "Mean arrival time in seconds.",
    getDefaultValue: () => 5.0
);

var simpleServerCommand = new Command("simple-server", "Run the SimpleServerAndGenerator demo");
simpleServerCommand.AddOption(meanArrivalSecondsOption);

simpleServerCommand.SetHandler((double meanArrivalSeconds) =>
{
    Console.WriteLine($"====== Running SimpleServerAndGenerator (Mean Arrival (Unit: second)={meanArrivalSeconds}) ======");
    SimpleServerAndGenerator.RunDemo(loggerFactory, meanArrivalSeconds);
}, meanArrivalSecondsOption);

var demoCommand = new Command("demo", "Run a simulation demo");
demoCommand.AddCommand(simpleServerCommand);

rootCommand.AddCommand(demoCommand);

return await rootCommand.InvokeAsync(args);

This console programme is then packaged into a Docker container. That’s it. The engine is complete.

The Orchestrator: Unleashing an Army with Argo Workflows

How do you manage a great number of pods without losing your mind?

My first attempt was using standard Kubernetes Jobs. Kubernetes Jobs are primitive, so they are hard to visualise, and managing retries or dependencies requires writing a lot of fragile bash scripts.

The solution is Argo Workflows.

Argo allows us to define the entire parameter sweep as a single workflow object. The killer feature here is the withItems. Alternative, using withParam loop, we can feed Argo a JSON list of parameter combinations, and it handles the rest: Fan-out, throttling, concurrency control, and retries.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: sna-simple-server-job-
spec:
  entrypoint: sna-demo
  serviceAccountName: argo-workflow
  templates:
  - name: sna-demo
    steps:
    - - name: run-simulation
        template: simulation-job
        arguments:
          parameters:
          - name: arrival-secs
            value: "{{item}}"
        withItems: ["5", "10", "20"]

  - name: simulation-job
    inputs:
      parameters:
      - name: arrival-secs
    container:
      image: chunlindocker/sna-demo:latest
      command: ["dotnet", "SimNextgenApp.Demo.dll"]
      args: ["demo", "simple-server", "--arrival-secs", "{{inputs.parameters.arrival-secs}}"]

This YAML file is our lab manager. It can also be extended to support scheduling, retries, and parallelism, transforming a complex manual task into a single declarative manifest.

The Argo Workflow UI with the fan-out/parallel nodes using the YAML above.

Instead of managing pods, we are now managing a definition of an experiment.

The Observatory: Finding the Needle in a Thousand Haystacks

With a thousand pods running simultaneously, kubectl logs is useless. You are generating gigabytes of text per minute. If one simulation produces an anomaly, finding it in a text stream is impossible.

We solve this with Structured Logging.

By using Serilog, our .NET Engine does not just write text. Instead, it emits machine-readable events with key-value pairs for our parameters and results. Every log entry contains the input parameters (for example, { "WorkerCount": 5, "ServiceTime": 10 }) attached to the result.

These structured logs are sent directly to a centralised platform like Seq. Now, instead of a thousand messy log streams, we have a single, queryable database of our entire experiment results.

Viewing the structured log on Seq generated with Serilog.

Wrap-Up: A Reusable Pattern

This architecture allows us to treat the Kubernetes not just as a place to host websites, but as a massive, on-demand supercomputer.

By separating the Engine from the Orchestrator and the Observatory, we have taken a problem that was too slow for a single machine and solved it using the native strengths of the Kubernetes. We did not need to rewrite the core C# logic. Instead, we just needed to wrap it in a clean interface and unleash a container army to do the work.

The full source code for the SNA library and the Argo workflow examples can be found on GitHub: https://github.com/gcl-team/SNA

*The turnout for my DES session in Taipei confirmed a growing hunger in our industry for proactive, simulation-driven approaches to engineering.*

P.S. I presented an early version of this blueprint at the Hello World Developer Conference 2025 in Taipei. The discussions with other engineers there were invaluable in refining these ideas.

The Blueprint Fallacy: A Case for Discrete Event Simulation in Modern Systems Architecture

October 18, 2025October 18, 2025 by Chun Lin, posted in Artificial Intelligence, C#, Data, Discrete Event Simulation, Event, Experience, Kubernetes, Observability

Greetings from Taipei!

I just spent two days at the Hello World Dev Conference 2025 in Taipei, and beneath the hype around cloud and AI, I observed a single, unifying theme: The industry is desperately building tools to cope with a complexity crisis of its own making.

The agenda was a catalog of modern systems engineering challenges. The most valuable sessions were the “踩雷經驗” (landmine-stepping experiences), which offered hard-won lessons from the front lines.

A 2-day technical conference on AI, Kubernetes, and more!

However, these talks raised a more fundamental question for me. We are getting exceptionally good at building tools to detect and recover from failure but are we getting any better at preventing it?

This post is not a simple translation of a Mandarin-language Taiwan conference. It is my analysis of the patterns I observed. I have grouped the key talks I attended into three areas:

Cloud Native Infrastructure;
Reshaping Product Management and Engineering Productivity with AI;
Deep Dives into Advanced AI Engineering.

Feel free to choose to dive into the section that interests you most.

Session: Smart Pizza and Data Observability

This session was led by Shuhsi (林樹熙), a Data Engineering Manager at Micron. Micron needs no introduction, they are a massive player in the semiconductor industry, and their smart manufacturing facilities are a prime example of where data engineering is mission-critical.

Shuhsi’s talk, “Data Observability by OpenLineage,” started with a simple story he called the “Smart Pizza” anomaly.

He presented a scenario familiar to anyone in a data-intensive environment: A critical dashboard flatlines, and the next three hours are a chaotic hunt to find out why. In his “Smart Pizza” example, the culprit was a silent, upstream schema change.

His solution, OpenLineage, is a powerful framework for what we would call digital forensics. It is about building a perfect, queryable map of the crime scene after the crime has been committed. By creating a clear data lineage, it reduces the “Mean Time to Discovery” from hours of panic to minutes of analysis.

Let’s be clear: This is critical, valuable work. Like OpenTelemetry for applications, OpenLineage brings desperately needed order to the chaos of modern data pipelines.

It is a fundamentally reactive posture. It helps us find the bullet path through the body with incredible speed and precision. However, my main point is that our ultimate goal must be to predict the bullet trajectory before the trigger is pulled. Data lineage minimises downtime. My work with simulation, which will be explained in the next session, aims to prevent it entirely by modelling these complex systems to find the breaking points before they break.

Session: Automating a .NET Discrete Event Simulation on Kubernetes

My talk, “Simulation Lab on Kubernetes: Automating .NET Parameter Sweeps,” addressed the wall that every complex systems analysis eventually hits: Combinatorial explosion.

While the industry is focused on understanding past failures, my session is about building the Discrete Event Simulation (DES) engine that can calculate and prevent future ones.

A restaurant simulation game in Honkai Impact 3rd. (Source:
西琳 – YouTube)

To make this concrete, I used the analogy of a restaurant owner asking, “Should I add another table or hire another waiter?” The only way to answer this rigorously is to simulate thousands of possible futures. The math becomes brutal, fast: testing 50 different configurations with 100 statistical runs each requires 5,000 independent simulations. This is not a task for a single machine; it requires a computational army.

My solution is to treat Kubernetes not as a service host, but as a temporary, on-demand supercomputer. The strategy I presented had three core pillars:

Declarative Orchestration: The entire 5,000-run DES experiment is defined in a single, clean Argo Workflows manifest, transforming a potential scripting nightmare into a manageable, observable process.
Radical Isolation: Each DES run is containerised in its own pod, creating a perfectly clean and reproducible experimental environment.
Controlled Randomness: A robust seeding strategy is implemented to ensure that “random” events in our DES are statistically valid and comparable across the entire distributed system.

*The turnout for my DES session confirmed a growing hunger in our industry for proactive, simulation-driven approaches to engineering.*

The final takeaway was a strategic re-framing of a tool many of us already use. Kubernetes is more than a platform for web apps. It can also be a general-purpose compute engine capable of solving massive scientific and financial modelling problems. It is time we started using it as such.

Session: AI for BI

Denny’s (監舜儀) session on “AI for BI” illustrated a classic pain point: The bottleneck between business users who need data and the IT teams who provide it. The proposed solution was a natural language interface, the FineChatBI, a tool designed to sit on top of existing BI platforms to make querying existing data easier.

His core insight was that the tool is the easy part. The real work is in building the “underground root system” which includes the immense challenge of defining metrics, managing permissions, and untangling data semantics. Without this foundation, any AI is doomed to fail.

Getting the underground root system right is important for building AI projects.

This is a crucial step forward in making our organisations more data-driven. However, we must also be clear about what problem is being solved.

This is a system designed to provide perfect, instantaneous answers to the question, “What happened?”

My work, and the next category of even more complex AI, begins where this leaves off. It seeks to answer the far harder question: “What will happen if…?” Sharpening our view of the past is essential, but the ultimate strategic advantage lies in the ability to accurately simulate the future.

Session: The Impossibility of Modeling Human Productivity

The presented Jugg (劉兆恭) is a well-known agile coach and the organiser of Agile Tour Taiwan 2020. His talk, “An AI-Driven Journey of Agile Product Development – From Inspiration to Delivery,” was a masterclass in moving beyond vanity metrics to understand and truly improve engineering performance.

Jugg started with a graph that every engineering lead knows in their gut. As a company grows over time:

Business grow (purple line, up);
Software architecture and complexity grow (first blue line, up);
The number of developers increases (second blue line, up);
Expected R&D productivity should grow (green line, up);
But paradoxically, the actual R&D productivity often stagnates or even declines (red line, down).

Jugg provided a perfect analogue for the work I do. He tackled the classic productivity paradox: Why does output stagnate even as teams grow? He correctly diagnosed the problem as a failure of measurement and proposed the SPACE framework as a more holistic model for this incredibly complex human system.

He was, in essence, trying to answer the same class of question I do: “If we change an input variable (team process), how can we predict the output (productivity)?”

This is where the analogy becomes a powerful contrast. Jugg’s world of human systems is filled with messy, unpredictable variables. His solutions are frameworks and dashboards. They are the best tools we have for a system that resists precise calculation.

This session reinforced my conviction that simulation is the most powerful tool we have for predicting performance in the systems we can actually control: Our code and our infrastructure. We do not have to settle for dashboards that show us the past because we can build models that calculate the future.

Session: Building a Map of “What Is” with GraphRAG

The most technically demanding session came from Nils (劉岦崱), a Senior Data Scientist at Cathay Financial Holdings. He presented GraphRAG, a significant evolution beyond the “Naive RAG” most of us use today.

He argued compellingly that simple vector search fails because it ignores relationships. By chunking documents, we destroy the contextual links between concepts. GraphRAG solves this by transforming unstructured data into a structured knowledge graph: a web of nodes (entities) and edges (their relationships).

Enhancing RAG-based application accuracy by constructing and leveraging knowledge graphs (Image Credit: LangChain)

In essence, GraphRAG is a sophisticated tool for building a static map of a known world. It answers the question, “How are all the pieces in our universe connected right now?” For AI customer service, this is a game-changer, as it provides a rich, interconnected context for every query.

This means our data now has an explicit, queryable structure. So, the LLM gets a much richer, more coherent picture of the situation, allowing it to maintain context over long conversations and answer complex, multi-faceted questions.

This session was a brilliant reminder that all advanced AI is built on a foundation of rigorous data modelling.

However, a map, no matter how detailed, is still just a snapshot. It shows us the layout of the city, but it cannot tell us how the traffic will flow at 5 PM.

This is the critical distinction. GraphRAG creates a model of a system at rest and DES creates a model of a system in motion. One shows us the relationships while the other lets us press watch how those relationships evolve and interact over time under stress. GraphRAG is the anatomy chart and simulation is the stress test.

Session: Securing the AI Magic Pocket with LLM Guardrails

Nils from Cathay Financial Holdings returned to the stage for Day 2, and this time he tackled one of the most pressing issues in enterprise AI: Security. His talk “Enterprise-Grade LLM Guardrails and Prompt Hardening” was a masterclass in defensive design for AI systems.

What made the session truly brilliant was his central analogy. As he put it, an LLM is a lot like Doraemon: a super-intelligent, incredibly powerful assistant with a “magic pocket” of capabilities. It can solve almost any problem you give it. But, just like in the cartoon, if you give it vague, malicious, or poorly thought-out instructions, it can cause absolute chaos. For a bank, preventing that chaos is non-negotiable.

Nils grounded the problem in the official OWASP Top 10 for LLM Applications.

There are two lines of defence: Guardrails and Prompt Hardening. The core of the strategy lies in understanding two distinct but complementary approaches:

Guardrails (The Fortress): An external firewall of input filters and output validators;
Prompt Hardening (The Armour): Internal defences built into the prompt to resist manipulation.

This is an essential framework for any enterprise deploying LLMs. It represents the state-of-the-art in building static defences.

While necessary, this defensive posture raises another important question for a developers: How does the fortress behave under a full-scale siege?

A static set of rules can defend against known attack patterns. But what about the unknown unknowns? What about the second-order effects? Specifically:

Performance Under Attack: What is the latency cost of these five layers of validation when we are hit with 10,000 malicious requests per second? At what point does the defence itself become a denial-of-service vector?
Emergent Failures: When the system is under load and memory is constrained, does one of these guardrails fail in an unexpected way that creates a new vulnerability?

These are not questions a security checklist can answer. They can only be answered by a dynamic stress test. The X-Teaming Nils mentioned is a step in this direction, but a full-scale DES is the ultimate laboratory.

Neil’s techniques are a static set of rules designed to prevent failure. Simulation is a dynamic engine designed to induce failure in a controlled environment to understand a system true breaking points. He is building the armour while my work with DES is in building the testing grounds to see where that armour will break.

Session: Driving Multi-Task AI with a Flowchart in a Single Prompt

The final and most thought-provoking session was delivered by 尹相志, who presented a brilliant hack: Embedding a Mermaid flowchart directly into a prompt to force an LLM to execute a deterministic, multi-step process.

He provided a new way beyond the chaos of autonomous agents and the rigidity of external orchestrators like LangGraph. By teaching the LLM to read a flowchart, he effectively turns it into a reliable state machine executor. It is a masterful piece of engineering that imposes order on a probabilistic system.

Action Grounding Principles proposed by 相志.

What he has created is the perfect blueprint. It is a model of a process as it should run in a world with no friction, no delays, and no resource contention.

And in that, he revealed the final, critical gap in our industry thinking.

A blueprint is not a stress test. A flowchart cannot answer the questions that actually determine the success or failure of a system at scale:

What happens when 10,000 users try to execute this flowchart at once and they all hit the same database lock?
What is the cascading delay if one step in the flowchart has a 5% chance of timing out?
Where are the hidden queues and bottlenecks in this process?

His flowchart is the architect’s beautiful drawing of an airplane. A DES is the wind tunnel. It is the necessary, brutal encounter with reality that shows us where the blueprint will fail under stress.

The ability to define a process is the beginning. The ability to simulate that process under the chaotic conditions of the real world is the final, necessary step to building systems that don’t just look good on paper, but actually work.

Final Thoughts and Key Takeaways from Taipei

My two days at the Hello World Dev Conference were not a tour of technologies. In fact, they were a confirmation of a dangerous blind spot in our industry.

From what I observe, they build tools for digital forensics to map past failures. They sharpen their tools with AI to perfectly understand what just happened. They create knowledge graphs to model the systems at rest. They design perfect, deterministic blueprints for how AI processes should work.

These are all necessary and brilliant advancements in the art of mapmaking.

However, the critical, missing discipline is the one that asks not “What is the map?”, but “What will happen to the city during the hurricane?” The hard questions of latency under load, failures, and bottlenecks are not found on any of their map.

Our industry is full of brilliant mapmakers. The next frontier belongs to people who can model, simulate, and predict the behaviour of complex systems under stress, before the hurricane reaches.

That is why I am building SNA, my .NET-based Discrete Event Simulation engine.

Hello, Taipei. Taken from the window of the conference venue.

I am leaving Taipei with a notebook full of ideas, a deeper understanding of the challenges and solutions being pioneered by my peers in the Mandarin-speaking tech community, and a renewed sense of excitement for the future we are all building.

Using Docker and Kubernetes without Docker Desktop on Windows 11

May 21, 2023May 21, 2023 by Chun Lin, posted in Docker, Experience, Kubernetes, Linux, Microservices and Containers, Ubuntu

Last week, my friend who is working on a microservice project at work suddenly messaged me saying that he realised Docker Desktop is no longer free.

Docker Desktop is basically an app that can be installed on our Windows machine to build and share containerised apps and microservices. It provides a straightforward GUI to manage our containers and images directly from our local machine.

Docker Desktop also includes a standalone Kubernetes server running locally within our Docker instance. It is thus very convenient for the developers to perform local testing easily using Docker Desktop.

Despite Docker Desktop remaining free for small businesses, personal use, education, and non-commercial open source projects, it now requires a paid subscription for professional use in larger businesses. Consequently, my friend expressed a desire for me to suggest a fast and free alternative for development without relying on Docker Desktop.

Install Docker Engine on WSL

Before we continue, we need to understand that Docker Engine is the fundamental runtime that powers Docker containers, while Docker Desktop is a higher-level application that includes Docker Engine. Hence, Docker Engine can also be used independently without Docker Desktop on local machine.

Fortunately, Docker Engine is licensed under the Apache License, Version 2.0. Thus, we are allowed to use it in our commercial products for free.

In order to install Docker Engine on Windows without using Docker Desktop, we need to utilise the WSL (Windows Subsystem for Linux) to run it.

Step 1: Enable WSL

We have to enable WSL from the Windows Features by checking the option “Windows Subsystem for Linux”, as shown in the screenshot below.

After that, we can press “OK” and wait for the operation to be completed. We will then be asked to restart our computer.

If we already have WSL installed earlier, we can update the built-in WSL to the Microsoft latest version of WSL using the “wsl –update” command in Command Prompt.

Later, if we want to shutdown WSL, we can run the command “wsl –shutdown”.

Step 2: Install Linux Distribution

After we restarted our machine, we can use the Microsoft Store app and look for the Linux distribution we want to use, for example Ubuntu 20.04 LTS, as shown below.

We then can launch Ubuntu 20.04 LTS from our Start Menu. To find out the version of Linux you are using, you can run the command “wslfetch”, as shown below.

For the first timer, we need to set the Linux username and password.

Step 3: Install Docker

Firstly, we need to update the Ubuntu APT repository using the “sudo apt update” command.

After we see the message saying that we have successfully updated the apt repository, we can proceed to install Docker. Here, the “-y” option is used to grant the permission to install required packages automatically.

When Docker is installed, we need to make a new user group with the name “docker” by utilising the below-mentioned command.

Docker Engine acts as a client-server application with a server that has a long-running daemon process dockerd. dockerd is the command used to start the Docker daemon on Linux systems. The Docker daemon is a background process that manages the Docker environment and is responsible for creating, starting, stopping, and managing Docker containers.

Before we can build images using Docker, we need to use dockerd, as shown in the screenshot below.

Step 4: Using Docker on WSL

Now, we simply need to open another WSL terminal and execute docker commands, such as docker ps, docker build, etc.

With this, we can now push our image to Docker Hub from our local Windows machine.

Configure a local Kubernetes

Now if we try to run the command line tool, kubectl, we will find out that the command is still not yet available.

We can use the following commands to install kubectl.

$ curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl


$ chmod +x ./kubectl


$ sudo mv ./kubectl /usr/local/bin/kubectl


$ kubectl version --client

The following screenshot shows what we can see after running the commands above.

After we have kubectl, we need to make Kubernetes available on our local machine. To do so, we need to install minikube, a local Kubernetes. minikube can setup a local Kubernetes cluster on macOS, Linux, and Windows.

To install the latest minikube stable release on x86-64 Linux using binary download:

$ curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64

$ sudo install minikube-linux-amd64 /usr/local/bin/minikube

The following is the results of running the installation of minicube. We also run the minicube by executing the command “minikube start”.

We can now run some basic kubectl commands, as shown below.

References

Getting Certified as Kubernetes App Developer: My Kaizen Journey

April 22, 2023April 22, 2023 by Chun Lin, posted in Docker, Experience, Kubernetes, Linux, Microservices and Containers

The high concentration of talented individuals in Samsung SDS is remarkable. I have worked alongside amazing colleagues who are not only friendly but also intelligent and dedicated to their work.

In July 2022, I had numerous discussions with my visionary and supportive seniors about the future of cloud computing. They eventually encouraged me to continue my cloud certification journey by taking the Certified Kubernetes Application Developer (CKAD) certification exam.

Before attempting the CKAD exam, I received advice on how demanding and challenging the assessment could be. Self-learning can also be daunting, particularly in a stressful work environment. However, I seized the opportunity to embark on my journey towards getting certified and committed myself to the process of kaizen, continuous improvement. It was a journey that required a lot of effort and dedication, but it was worth it.

I took the CKAD certification exam while I was working in Seoul in March 2023. The lovely weather has a soothing impact on my stress levels.

August 2022: Learning Docker Fundamentals

To embark on a successful Kubernetes learning journey, I acknowledge the significance of first mastering the fundamentals of Docker.

Docker is a tool that helps developers build, package, and run applications in a consistent way across different environments. Docker allows us to package our app and its dependencies into a Docker container, and then run it on any computer that has Docker installed.

Docker serves as the foundation for many container-based technologies, including Kubernetes. Hence, understanding Docker fundamentals provides a solid groundwork for comprehending Kubernetes.

There is a learning path on Pluralsight specially designed for app developers who are new to Docker so that they can learn more about developing apps with Docker.

I borrowed the free Pluralsight account from my friend, Marvin Heng.

The learning path helps me gain essential knowledge and skills that are directly applicable to Kubernetes. For example, it shows me how the best practices of optimising Docker images by carefully placing the Docker instructions and making use of its caching mechanism.

In the learning path, we learnt about Docker Swarm. Docker Swarm is a tool that helps us manage and orchestrate multiple Docker containers across multiple machines or servers, making it easier to deploy and scale our apps.

A simple architecture diagram of a system using Kubernetes. (Source: Pluralsight)

After getting the basic understanding of Docker Swarm, we move on to learning Kubernetes. Kubernetes is similar to Docker Swarm because they are both tools for managing and orchestrating containerised apps. However, Kubernetes has a larger and more mature ecosystem, with more third-party tools and plugins available for tasks like monitoring, logging, and service discovery.

December 2022: Attending LXF Kubernetes Course

Kubernetes is a project that was originally developed by Google, but it is now maintained by the Cloud Native Computing Foundation (CNCF), which is a sub-foundation of the Linux Foundation.

Samsung SDS is a member of the CNCF and Kubernetes Certified Service Provider. (Image Credit: NewsWorld)

The Linux Foundation provides a neutral and collaborative environment for open-source projects like Kubernetes to thrive, and the CNCF is able to leverage this environment to build a strong community of contributors and users around Kubernetes.

In addition, the Linux Foundation offers a variety of certification exams that allow individuals to demonstrate their knowledge and skills in various areas of open-source technology. CKAD is one of them.

The Linux Foundation also offers Kubernetes-related training courses.

The CKAD course is self-paced and can be completed online, making it accessible to learners around the world. It is designed for developers who have some experience with Kubernetes and want to deepen their knowledge and skills in preparation for the CKAD certification exam.

The CKAD course includes a combination of lectures, hands-on exercises, and quizzes to reinforce the concepts covered. It covers a wide range of topics related to Kubernetes, including:

Kubernetes architecture;
Build and design;
Deployment configuration;
App exposing;
App troubleshooting;
Security in Kubernetes;
Helm.

Kubectl, the command-line client used to interact with Kubernetes clusters. (Image Credit: The Linux Foundation Training)

January 2023: Going through CKAD Exercises and Killer Shell

Following approximately one month of dedicated effort, I successfully completed the online course and proudly received my course completion certificate on 7th of January 2023. So, throughout the remainder of January, I directed my attention towards exam preparation by diligently working through the various online exercises.

The initial series of exercises that I went through is the CKAD exercise thoughtfully curated by a skilled software developer, dgkanatsios, and made available on GitHub. The exercise covers the following areas:

Core concepts;
Multi-container pods;
Pod design;
Configuration;
Observability;
Services and networking;
State persistence;
Helm;
Custom Resource Definitions.

The exercise comprises numerous questions, therefore, my suggestion would be to devote one week to thoroughly delve into them, by allocating an hour each day to tackle a subset of the questions.

During my 10-day Chinese New Year holiday, I dedicated my time towards preparing for the exam. (Image Credit: Global Times)

Furthermore, upon purchasing the CKAD exam, we are entitled to receive two complementary simulator sessions for the exam on Killer Shell (killer.sh), both containing the same set of questions. Therefore, it is advisable to strategise and plan our approach towards making optimal utilisation of them.

After going through all the questions in the CKAD exercise mentioned above, I proceeded to undertake the initial killer.sh exam. The simulator features an interface that closely resembles the new remote desktop Exam UI, thereby providing me with invaluable insights on how the actual exam will be conducted.

The killer.sh session is allocated a total of 2 hours for the exam, encompassing a set of 22 questions. Similar to the actual exam, the session is to test our hands-on experience and practical knowledge of Kubernetes. Thus, we are expected to demonstrate our proficiency by completing a series of tasks in a given Kubernetes environment.

The simulator questions are comparatively more challenging than the actual exam. In my initial session, I was able to score only 50% out of 100%. Upon analysing and rectifying my errors, I resolved to invest an additional month’s time to study and prepare more comprehensively.

Scenario-based questions like this are expected in the CKAD exam.

February 2023: Working on Cloud Migration Project

Upon my return from the Chinese New Year holiday, to my dismay, I discovered that I had been assigned to a cloud migration project at work.

The project presented me with an exceptional chance to deploy an ASP .NET solution on Kubernetes on Google Cloud Platform, allowing me to put into practice what I have learned and thereby fortify my knowledge of Kubernetes-related topics.

Furthermore, I am lucky to have had the opportunity to engage in a fruitful discussion with my fellow colleagues, through which I was able to learn more from them about Kubernetes by presenting my work.

March 2023: The Exam

In the early of March, I was assigned to visit Samsung SDS in Seoul until the end of the month. Therefore, I decided to seize the opportunity to complete my second kill.sh simulation session. This time, I managed to score more than the passing score, which is 66%.

After that, I dedicated an extra week to reviewing the questions in the CKAD exercises on GitHub before proceeding to take the actual CKAD exam.

The actual CKAD exam consists of 16 questions that need to be completed within 2 hours. Even though the exam is online and open book, we are not allowed to refer any resources other than the Kubernetes documentation and the Helm documentaion during the exam.

In addition, the exam has been updated to use the PSI Bridge where we get access to a remote desktop instead of just a remote terminal. There is an an article about it. This should not be unfamiliar to you if you have gone through the killer.sh exams.

The new exam UI now provides us access to a full remote XFCE desktop, enabling us to run the terminal application and Firefox to open the approved online documentations, unlike the previous exam UI. Thus, having multiple monitors and bookmarking the documentation pages on our personal Internet browser before the exam are no longer helpful.

Before taking the exam, there are a lot more key points mentioned in the Candidate Handbook, the Important Instructions, and the PSI Bridge System Requirements that can help ensure success. Please make sure you have gone through them and get your machine and environment ready for the exam.

Even though I am 30-minute early to the exam, I faced a technical issue with Chrome on my laptop that caused me to be 5 minutes late for the online exam. Fortunately, my exam time was not reduced due to the delay.

The issue was related to the need to end the “remoting_host.exe” application used by Chrome Remote Desktop in order to use a specific browser for the exam. Despite trying to locate it in task manager, I was unable to do so. After searching on Google, I found a solution for Windows users. We need to execute the command “net stop chromoting” to the “remoting_host.exe”.

During my stay in Seoul, my room at Shilla Stay Seocho served as my exam location.

CKAD certification exam is an online proctored exam. This means that it can be taken remotely but monitored by a proctor via webcam and microphone to ensure the integrity of the exam. Hence, to ensure a smooth online proctored exam experience, it is crucial to verify that our webcam is capable of capturing the text on our ID, such as our passport, and that we are using a stable, high-speed Internet connection.

During the exam, the first thing I did is to create a few aliases as listed below.

alias k="kubectl "
alias kn="kubectl config set-context --current --namespace"
export dry="--dry-run=client -o yaml"
export now="--force --grace-period 0"

These aliases helped me to complete the commands quickier. In addition, if it’s possible, I also always use an imperative command to create a YAML file using kubectl.

By working on the solution based on the generated YAML file, I am able to save a significant amount of time as opposed to writing the entire YAML file from scratch.

I completed only 15 questions with 1 not answered. I chose to forgo a 9-mark question that I was not confident in answering correctly, in order to have more time to focus on other questions. In the end, I still managed to score 78% out of 100%.

The passing score for CKAD is 66% out of 100%.

Moving Forward: Beyond the Certification

In conclusion, obtaining certification in one’s chosen field can be a valuable asset for personal and professional development. In my experience, it has helped me feel more confident in my abilities and given me a sense of purpose in my career.

However, it is essential to remember that it is crucial to continue learning and growing, both through practical experience and ongoing education, in order to stay up-to-date with the latest developments in the field. The combination of certification, practical experience, and ongoing learning can help us to achieve our career goals and excel in our role as a software engineer.

Together, we learn better.