Chun Lin

Automate Orchard Core Deployment on AWS ECS with CloudFormation

March 9, 2025March 11, 2025 by Chun Lin, posted in Cloud Computing: Amazon Web Services (AWS), CloudFormation, Experience, Orchard Core

For .NET developers looking for Content Management System (CMS) solution, Orchard Core presents a compelling, open-source option. Orchard Core is a CMS built on ASP.NET Core. When deploying Orchard Core on AWS, the Elastic Container Service (ECS) provides a good hosting platform that can handle high traffic, keep costs down, and remain stable.

However, finding clear instructions for deploying Orchard Core to ECS end-to-end can be difficult. This may require us to do more testing and troubleshooting, and potentially lead to a less efficient or secure setup. A lack of a standard deployment process can also complicate infrastructure management and hinder the implementation of CI/CD. This is where Infrastructure as Code (IaC) comes in.

Source Code

The complete CloudFormation template we built in this article is available on GitHub: https://github.com/gcl-team/Experiment.OrchardCore.Main/blob/main/Infrastructure.yml

CloudFormation

IaC provides a solution for automating infrastructure management. With IaC, we define our entire infrastructure which hosts Orchard Core setup as code. This code can then be version-controlled, tested, and deployed just like application code.

CloudFormation is an AWS service that implements IaC. By using CloudFormation, AWS automatically provisions and configures all the necessary resources for our Orchard Core hosting, ensuring consistent and repeatable deployments across different environments.

This article is for .NET developers who know a bit about AWS concepts such as ECS or CloudFormation. We’ll demonstrate how CloudFormation can help to setup the infrastructure for hosting Orchard Core on AWS.

The desired infrastructure of our CloudFormation setup.

Now let’s start writing our CloudFormation as follows. We start by defining some useful parameters that we will be using later. Some of the parameters will be discussed in the following relevant sections.

AWSTemplateFormatVersion: '2010-09-09'
Description: "Infrastructure for Orchard Core CMS"

Parameters:
  VpcCIDR:
    Type: String
    Description: "VPC CIDR Block"
    Default: 10.0.0.0/16
    AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
  ApiGatewayStageName:
    Type: String
    Default: "production"
    AllowedValues:
      - production
      - staging
      - development
  ServiceName:
    Type: String
    Default: cld-orchard-core
    Description: "The service name"
  CmsDBName:
    Type: String
    Default: orchardcorecmsdb
    Description: "The name of the database to create"
  CmsDbMasterUsername:
    Type: String
    Default: orchardcoreroot
  HostedZoneId:
    Type: String
    Default: <your Route 53 hosted zone id>
  HostedZoneName:
    Type: String
    Default: <your custom domain>
  CmsHostname:
    Type: String
    Default: orchardcms
  OrchardCoreImage:
    Type: String
    Default: <your ECR link>/orchard-core-cms:latest
  EcsAmi:
    Description: The Amazon Machine Image ID used for the cluster
    Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
    Default: /aws/service/ecs/optimized-ami/amazon-linux-2023/recommended/image_id

Dockerfile

The Dockerfile is quite straightforward.

# Global Arguments
ARG DCR_URL=mcr.microsoft.com
ARG BUILD_IMAGE=${DCR_URL}/dotnet/sdk:8.0-alpine
ARG RUNTIME_IMAGE=${DCR_URL}/dotnet/aspnet:8.0-alpine

# Build Container
FROM ${BUILD_IMAGE} AS builder
WORKDIR /app

COPY . .

RUN dotnet restore
RUN dotnet publish ./OCBC.HeadlessCMS/OCBC.HeadlessCMS.csproj -c Release -o /app/src/out

# Runtime Container
FROM ${RUNTIME_IMAGE}

## Install cultures
RUN apk add --no-cache \
   icu-data-full \
   icu-libs

ENV ASPNETCORE_URLS http://*:5000

WORKDIR /app

COPY --from=builder /app/src/out .

EXPOSE 5000

ENTRYPOINT ["dotnet", "OCBC.HeadlessCMS.dll"]

With the Dockerfile, we then can build the Orchard Core project locally with the command below.

docker build --platform=linux/amd64 -t orchard-core-cms:v1 .

The --platform flag specifies the target OS and architecture for the image being built. Even though it is optional, it is particularly useful when building images on a different platform (like macOS or Windows) and deploying them to another platform (like Amazon Linux) that has a different architecture.

ARM-based Apple Silicon was announced in 2020. (Image Credit: The Verge)

I am using macOS with ARM-based Apple Silicon, whereas Amazon Linux AMI uses amd64 (x86_64) architecture. Hence, if I do not specify the platform, the image I build on my Macbook will be incompatible with EC2 instance.

Once the image is built, we will push it to the Elastic Container Registry (ECR).

We choose ECR because it is directly integrated with ECS, which means deploying images from ECR to ECS is smooth. When ECS needs to pull an image from ECR, it automatically uses the IAM role to authenticate and authorise the request to ECR. The execution role of our ECS is associated with the AmazonECSTaskExecutionRolePolicy IAM policy, which allows ECS to pull images from ECR.

ECR also comes with built-in support for image scanning, which automatically scans our images for vulnerabilities.

Image scanning in ECR helps ensure our images are secure before we deploy them.

Unit 01: IAM Role

Technically, we are able to run Orchard Core on ECS without any ECS task role. However, that is possible only if our Orchard Core app does not need to interact with AWS services. Not only for our app, but actually most of the modern web apps, we always need to integrate our app with AWS services such as S3, CloudWatch, etc. Hence, the first thing that we need to work on is setting up an ECS task role.

iamRole:
  Type: AWS::IAM::Role
  Properties:
    RoleName: !Sub "${AWS::StackName}-ecs"
    Path: !Sub "/${AWS::StackName}/"
    AssumeRolePolicyDocument:
      Version: 2012-10-17
      Statement:
        - Effect: Allow
          Principal:
            Service:
              - ecs-tasks.amazonaws.com
          Action:
            - sts:AssumeRole

In AWS IAM, permissions are assigned to roles, not directly to the services that need them. Thus, we cannot directly assign IAM policies to ECS tasks. Instead, we assign those policies to a role, and then the ECS task temporarily assumes that role to gain those permissions, as shown in the configuration above.

Roles are considered temporary because they are only assumed for the duration that the ECS task needs to interact with AWS resources. Once the ECS task stops, the temporary permissions are no longer valid, and the service loses access to the resources.

Hence, by using roles and AssumeRole, we follow the principle of least privilege. The ECS task is granted only the permissions it needs and can only use them temporarily.

Unit 02: CloudWatch Log Group

ECS tasks, by default, do not have logging enabled.

Hence, assigning a role to our ECS task for logging to CloudWatch Logs is definitely one of the first roles we should assign when setting up ECS tasks. Setting logging up early helps to avoid surprises later on when our ECS tasks are running.

To setup the logging, we first need to specify Log Group, a place in CloudWatch that logs go. While ECS itself can create the log group automatically when the ECS task starts (if it does not already exist), it is a good practice to define the log group in CloudFormation to ensure it exists ahead of time and can be managed within our IaC.

ecsLogGroup:
  Type: AWS::Logs::LogGroup
  Properties:
    LogGroupName: !Sub "/ecs/${ServiceName}-log-group"
    RetentionInDays: 3
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

The following policy will grant the necessary permissions to write logs to CloudWatch.

ecsLoggingPolicy:
  Type: AWS::IAM::Policy
  Properties:
    PolicyName: !Sub "${AWS::StackName}-cloudwatch-logs-policy"
    Roles:
      - !Ref iamRole
    PolicyDocument:
      Version: 2012-10-17
      Statement:
        - Effect: Allow
          Action:
            - logs:CreateLogStream
            - logs:PutLogEvents
          Resource:
            - !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/ecs/${ServiceName}-log-group/*"

By separating the logging policy into its own resource, we make it easier to manage and update policies independently of the ECS task role. After defining the policy, we attach it to the ECS task role by referencing it in the Roles section.

The logging setup helps us consolidate log events from the container into a centralised log group in CloudWatch.

Unit 03: S3 Bucket

We will be storing the files uploaded to the Orchard Core through its Media module on Amazon S3. So, we need to configure our S3 Bucket as follows.

mediaContentBucket:
  Type: AWS::S3::Bucket
  Properties:
    BucketName: !Join
      - '-'
      - - !Ref ServiceName
        - !Ref AWS::Region
        - !Ref AWS::AccountId
    OwnershipControls:
      Rules:
        - ObjectOwnership: BucketOwnerPreferred
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

Since bucket names must be globally unique, we dynamically create it using AWS Region and AWS Account ID.

Since our Orchard Core can be running in multiple ECS tasks that upload media files to a shared S3 bucket, the BucketOwnerPreferred setting ensures that even if media files are uploaded by different ECS tasks, the owner of the S3 bucket can still access, delete, or modify any of those media files without needing additional permissions for each uploaded object.

The bucket owner having full control is a security necessity in many cases because it allows the owner to apply policies, access controls, and auditing in a centralised way, maintaining the security posture of the bucket.

However, even if the bucket owner has control, the principle of least privilege should still apply. For example, only the ECS task responsible for Orchard Core should be allowed to interact with the media objects.

mediaContentBucketPolicy:
  Type: AWS::IAM::Policy
  Properties:
    PolicyName: !Sub "${mediaContentBucket}-s3-policy"
    Roles:
      - !Ref iamRole
    PolicyDocument:
      Version: 2012-10-17
      Statement:
        - Effect: Allow
          Action:
            - s3:ListBucket
          Resource: !GetAtt mediaContentBucket.Arn
        - Effect: Allow
          Action:
            - s3:PutObject
            - s3:GetObject
          Resource: !Join ["/", [!GetAtt mediaContentBucket.Arn, "*"]]

Keeping the s3:ListBucket permission in the policy is a necessary permission for Orchard Core Media module to work properly. Meanwhile, both s3:PutObject and s3:GetObject are used for uploading and downloading media files.

IAM Policy

Now, let’s pause a while to talk about the policies that we have added above for the log group and S3.

In AWS, we mostly deal with managed policies and inline policies depending on whether the policy needs to be reused or tightly scoped to one role.

We use AWS::IAM::ManagedPolicy when the permission needs to be reused by multiple roles or services. So it is frequently used in company-wide security policies. Thus it is not suitable for our Orchard Core examples above. Instead, we use AWS::IAM::Policy because it is for a permission which is tightly connected to a single role and will not be reused elsewhere.

In addition, since AWS::IAM::Policy is tightly tied to entities, it will be deleted when the corresponding entities are deleted. This is a key difference from AWS::IAM::ManagedPolicy, which remains even if the entities that use it are deleted. This explains why managed policy is used in company-wide policies because managed policy provides better long-term management for permissions that may be reused across multiple roles.

We can summarise the differences between two of them into the following table.

Feature	Managed Policy	Policy
Scope	Company-wide.	Tight coupling to a single entity.
Deletion Behaviour	*Persists* even if attached entities are deleted.	Deleted along with the associated entity.
Versioning Support	Supports versioning (can roll back).	No.
Limit per Entity	20.	10.
Best Use Case	Long-term, reusable permissions (e.g., company-wide security policies).	One-off, tightly scoped permissions (e.g., role-specific needs).

Unit 04: Aurora Database Cluster

Orchard Core supports Relational DataBase Management System (RDBMS). Unlike traditional CMS platforms that rely on a single database engine, Orchard Core offers flexibility by supporting multiple RDBMS options, including:

Microsoft SQL Server;
PostgreSQL;
MySQL;
SQLite.

While SQLite is lightweight and easy to use, it is not suitable for production deployments on AWS. SQLite is designed for local storage, not multi-user concurrent access. On AWS, there are fully managed relational databases (RDS and Aurora) provided instead.

The database engines supported by Amazon RDS and Amazon Aurora.

While Amazon RDS is a well-known choice for relational databases, we can also consider Amazon Aurora, which was launched in 2014. Unlike traditional RDS, Aurora automatically scales up and down, reducing costs by ensuring we only pay for what we use.

High performance and scalability of Amazon Aurora. (Image Source: Amazon Aurora MySQL PostgreSQL Features)

In addition, Aurora is faster than standard PostgreSQL and MySQL, as shown in the screenshot above. It also offers built-in high availability with Multi-AZ replication. This is critical for a CMS like Orchard Core, which relies on fast queries and efficient data handling.

It is important to note that, while Aurora is optimised for AWS, it does not lock us in, as we retain full control over our data and schema. Hence, if we ever need to switch, we can export data and move to standard MySQL/PostgreSQL on another cloud or on-premises.

Instead of manually setting up Aurora, we will be using CloudFormation to ensure that the correct database instance, networking, security settings, and additional configurations are managed consistently.

Aurora is cluster-based rather than standalone DB instances like traditional RDS. Thus, instead of a single instance, we deploy a DB cluster, which consists of a primary writer node and multiple reader nodes for scalability and high availability.

Because of this cluster-based architecture, Aurora does not use the usual DBParameterGroup like standalone RDS instances. Instead, it requires a DBClusterParameterGroup to apply settings at the cluster level, ensuring all instances in the cluster inherit the same configuration, as shown in the following Cloudformation template.

cmsDBClusterParameterGroup:
  Type: AWS::RDS::DBClusterParameterGroup
  Properties:
    Description: "Aurora Provisioned Postgres DB Cluster Parameter Group"
    Family: aurora-postgresql16
    Parameters:
      timezone: UTC  # Ensures consistent timestamps
      rds.force_ssl: 1  # Enforce SSL for security

The first parameter we configure is the timezone. We set it to UTC to ensure consistency. So when we store date-time values in the database, we should use TIMESTAMPTZ for timestamps, and store the time zone as a TEXT field. After that, when we need to display the time in a local format, we can use the AT TIME ZONE feature in PostgreSQL to convert from UTC to the desired local time zone. This is important because PostgreSQL returns all times in UTC, so storing the time zone ensures we can always retrieve and present the correct local time when needed, as shown in the query below.

SELECT event_time_utc AT TIME ZONE timezone AS event_local_time
FROM events;

After that, we enabled the rds.force_ssl so that all connections to our Aurora are encrypted using SSL. This is necessary to prevent data from being sent in plaintext. Even if our Aurora database is behind a bastion host, enforcing SSL connections is still recommended because SSL ensures the encryption of all data in transit, adding an extra layer of security. It is also worth mentioning that enabling SSL does not negatively impact performance much, but it adds a significant security benefit.

Once the DBClusterParameterGroup is configured, the next step is to configure the AWS::RDS::DBCluster resource, where we will define the cluster main configuration with the parameter group defined above.

cmsDatabaseCluster:
  Type: AWS::RDS::DBCluster
  Properties:
    BackupRetentionPeriod: 7  
    DatabaseName: !Ref CmsDBName
    DBClusterIdentifier: !Ref AWS::StackName
    DBClusterParameterGroupName: !Ref cmsDBClusterParameterGroup
    DeletionProtection: true
    Engine: aurora-postgresql
    EngineMode: provisioned
    EngineVersion: 16.1
    MasterUsername: !Ref CmsDbMasterUsername
    MasterUserPassword: !Sub "{{resolve:ssm-secure:/OrchardCoreCms/DbPassword:1}}"
    DBSubnetGroupName: !Ref cmsDBSubnetGroup
    VpcSecurityGroupIds:
      - !GetAtt cmsDBSecurityGroup.GroupId
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

Let’s go through the Properties.

About BackupRetentionPeriod

The BackupRetentionPeriod parameter in the Aurora DB cluster determines how many days automated backups are retained by AWS. It can be from a minimum of 1 day to a maximum of 35 days for Aurora databases. For most business applications, 7 days of backups is often enough to handle common recovery scenarios unless we are required by law or regulation to keep backups for a certain period.

Aurora automatically performs incremental backups for our database every day, which means that it does not back up the entire database each time. Instead, it only stores the changes since the previous backup. This makes the backup process very efficient, especially for databases with little or no changes over time. If our CMS database remains relatively static, then the backup storage cost will remain very low or even free as long as our total backup data for the whole retention period does not exceed the storage capacity of our database.

So the total billed usage for backup depends on how much data is being changed each day, and whether the total backup size exceeds the volume size. If our database does not experience massive daily changes, the backup storage will likely remain within the database size and be free.

About DBClusterIdentifier

For the DBClusterIdentifier, we set it to the stack name, which makes it unique to the specific CloudFormation stack. This can be useful for differentiating clusters.

About DeletionProtection

In production environments, data loss or downtime is critical. DeletionProtection ensures that our CMS DB cluster will not be deleted unless it is explicitly disabled. There is no “shortcut” to bypass it for production resources. If DeletionProtection is enabled on the DB cluster, even CloudFormation will fail to delete the DB cluster. The only way to delete the DB cluster is that we disable DeletionProtection first via the AWS Console, CLI or SDK.

About EngineMode

In Aurora, EngineMode refers to the database operational mode. There are two primary modes, i.e. Provisioned and Serverless. For Orchard Core, Provisioned mode is typically the better choice because the mode ensures high availability, automatic recovery, and read scaling. Hence, if the CMS is going to have a consistent level of traffic, Provisioned mode will be able to handle that load. Serverless is useful if our CMS workload has unpredictable traffic patterns or usage spikes.

About MasterUserPassword

Storing database passwords directly in the CloudFormation template is a security risk.

There are a few other ways to handle sensitive data like passwords in CloudFormation, for example using AWS Secrets Manager and AWS Systems Manager (SSM) Parameter Store.

AWS Secrets Manager is a more advanced solution that offers automatic password rotation, which is useful for situations where we need to regularly rotate credentials. However, it may incur additional costs.

On the other hand, SSM Parameter Store provides a simpler and cost-effective solution for securely storing and referencing secrets, including database passwords. We can store up to 10,000 parameters (standard type) without any cost.

Hence, we need to use SSM Parameter Store to securely store the database password and reference it in CloudFormation without exposing it directly in our template, reducing the security risks and providing an easier management path for our secrets.

Database password is stored as a SecureString in Parameter Store.

About DBSubnetGroupName and VpcSecurityGroupIds

These two configurations about Subnet and VPC will involve networking considerations. We will discuss further when we dive into the networking setup later.

Unit 05: Aurora Database Instance

Now that we have covered the Aurora DB cluster, which is the overall container for the database, let’s move on to the DB instance.

Think of the cluster as the foundation, and the DB instances are where the actual database operations take place. The DB instances are the ones that handle the read and write operations, replication, and scaling for the workload. So, in order for our CMS to work correctly, we need to define the DB instance configuration, which runs on top of the DB cluster.

cmsDBInstance:
  Type: 'AWS::RDS::DBInstance'
  DeletionPolicy: Retain
  Properties:
    DBInstanceIdentifier: !Sub "${AWS::StackName}-db-instance"
    DBInstanceClass: db.t4g.medium
    DBClusterIdentifier: !Ref cmsDatabaseCluster
    DBSubnetGroupName: !Ref cmsDBSubnetGroup
    Engine: aurora-postgresql
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

For our Orchard Core CMS, we do not expect very high traffic or intensive database operations. Hence, we choose to use db.t4g. T4g database instances are AWS Graviton2-based, thus they are more cost-efficient than traditional instance types, especially for workloads like a CMS that does not require continuous high performance. However, there are a few things we make need to look into when using T instance classes.

Unit 06: Virtual Private Cloud (VPC)

Now that we have covered how the Aurora cluster and instance work, the next important thing is ensuring they are deployed in a secure and well-structured network. This is where the Virtual Private Cloud (VPC) comes in.

VPC is a virtual network in AWS where we define the infrastructure networking. It is like a private network inside AWS where we can control IP ranges, subnets, routing, and security.

By the way, you might have noticed that AWS automatically provides a default VPC in every region. It is a ready-to-use network setup that allows us to launch resources without configuring networking manually.

While it is convenient, it is recommended not to use the default VPC. This is because the default VPC is automatically created with predefined settings, which means we do not have full control over its configuration, such as subnet sizes, routing, security groups, etc. It also has public subnets by default which can accidentally expose internal resources to the Internet.

Since we are setting up our own VPC, one key decision we need to make is the CIDR block, i.e. the range of private IPs we allocate to our network. This is important because it determines how many subnets and IP addresses we can have within our VPC.

To future-proof our infrastructure, we will be using a /16 CIDR block, as shown in the VpcCIDR in our CloudFormation template. This gives us 65,536 IP addresses, which we can break into 64 subnets of /22 (each having 1,024 IPs). 64 subnets is usually more than enough for a well-structured VPC because most companies do not even need so many subnets in a single VPC unless they have very complex workloads. Just in case if one service needs more IPs, we can allocate a larger subnet, for example /21 instead of /22.

In the VPC setup, we are also trying to avoid creating too many VPCs unnecessarily. Managing multiple VPCs means handling VPC peering which increases operational overhead.

vpc:
  Type: AWS::EC2::VPC
  Properties:
    CidrBlock: !Ref VpcCIDR
    InstanceTenancy: default
    EnableDnsSupport: true
    EnableDnsHostnames: true
    Tags:
      - Key: Name
        Value: !Sub "${AWS::AccountId}-${AWS::Region}-vpc"

Since our ECS workloads and Orchard Core CMS are public-facing, we need EnableDnsHostnames: true so that public-facing instances get a public DNS name. We also need EnableDnsSupport: true to allow ECS tasks, internal services, and AWS resources like S3 and Aurora to resolve domain names internally.

For InstanceTenancy, which determines whether instances in our VPC run on shared (default) or dedicated hardware, it is recommended to use the default because AWS automatically places instances on shared hardware, which is cost-effective and scalable. We only need to change it if we are asked to use dedicated instances with full hardware isolation.

Now that we have defined our VPC, the next step is planning its subnet structure. We need both public and private subnets for our workloads.

Unit 07: Subnets and Subnet Groups

For our VPC with a /16 CIDR block, we will be breaking it into /24 subnets for better scalability:

Public Subnet 1: 10.0.0.0/24
Public Subnet 2: 10.0.1.0/24
Private Subnet 1: 10.0.2.0/24
Private Subnet 2: 10.0.3.0/24

Instead of manually specifying CIDRs, we will let CloudFormation automatically calculates the CIDR blocks for public and private subnets using !Select and !Cidr, as shown below.

# Public Subnets
publicSubnet1:
  Type: AWS::EC2::Subnet
  Properties:
    VpcId: !Ref vpc
    CidrBlock: 10.0.0.0/24
    AvailabilityZone: !Select [0, !GetAZs '']
    Tags:
      - Key: Name
        Value: !Sub "${AWS::AccountId}-${AWS::Region}-public-subnet-1"

publicSubnet2:
  Type: AWS::EC2::Subnet
  Properties:
    VpcId: !Ref vpc
    CidrBlock: 10.0.1.0/24
    AvailabilityZone: !Select [1, !GetAZs '']
    Tags:
      - Key: Name
        Value: !Sub "${AWS::AccountId}-${AWS::Region}-public-subnet-2"

# Private Subnets
privateSubnet1:
  Type: AWS::EC2::Subnet
  Properties:
    VpcId: !Ref vpc
    CidrBlock: 10.0.2.0/24
    AvailabilityZone: !Select [0, !GetAZs '']
    Tags:
      - Key: Name
        Value: !Sub "${AWS::AccountId}-${AWS::Region}-private-subnet-1"

privateSubnet2:
  Type: AWS::EC2::Subnet
  Properties:
    VpcId: !Ref vpc
    CidrBlock: 10.0.3.0/24
    AvailabilityZone: !Select [1, !GetAZs '']
    Tags:
      - Key: Name
        Value: !Sub "${AWS::AccountId}-${AWS::Region}-private-subnet-2"

For availability zones (AZs), all commercial AWS regions have at least two AZs, with most having three or more. Hence, we do not need to worry about the assignment of !Select [1, !GetAZs ''] in the template above will fail.

Now with our subnets setup, we can revisit the DBSubnetGroupName in Aurora cluster and instance. Aurora clusters are highly available, and AWS recommends placing Aurora DB instances across multiple AZs to ensure redundancy and better fault tolerance. The Subnet Group allows us to define the subnets where Aurora will deploy its instances, which enables the multi-AZ deployment for high availability.

cmsDBSubnetGroup:
  Type: AWS::RDS::DBSubnetGroup
  Properties:
    DBSubnetGroupDescription: "Orchard Core CMS Postgres DB Subnet Group"
    SubnetIds:
      - !Ref privateSubnet1
      - !Ref privateSubnet2
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

Unit 08: Security Groups

Earlier, we configured the Subnet Group for Aurora, which defines which subnets the Aurora instances will reside in. Now, we need to ensure that only authorised systems or services can access our database. That is where the Security Group cmsDBSecurityGroup comes into play.

A Security Group acts like a virtual firewall that controls inbound and outbound traffic to our resources, such as our Aurora instances. It is like setting permissions to determine which IP addresses and which ports can communicate with the database.

For Aurora, we will configure the security group to only allow traffic from our private subnets, so that only trusted services within our VPC can reach the database.

cmsDBSecurityGroup:
  Type: AWS::EC2::SecurityGroup
  Properties:
    GroupName: !Sub "${CmsDBName}-security-group"
    GroupDescription: "Permits Access To CMS Aurora Database"
    VpcId: !Ref vpc
    SecurityGroupIngress:
    - CidrIp: !GetAtt privateSubnet1.CidrBlock
      IpProtocol: tcp
      FromPort: 5432
      ToPort: 5432
    - CidrIp: !GetAtt privateSubnet2.CidrBlock
      IpProtocol: tcp
      FromPort: 5432
      ToPort: 5432
    Tags:
      - Key: Name
        Value: !Sub "${CmsDBName}-security-group"
      - Key: Stack
        Value: !Ref AWS::StackName

Here we only setup security group for ingress but not egress because AWS security groups, by default, allow all outbound traffic.

Unit 09: Elastic Load Balancing (ELB)

Before diving into how we host Orchard Core on ECS, let’s first figure out how traffic will reach our ECS service. In modern cloud web app development and hosting, three key factors matter: reliability, scalability, and performance. And that is why a load balancer is essential.

Reliability – If we only have one container and it crashes, the whole app goes down. A load balancer allows us to run multiple containers so that even if one fails, the others keep running.
Scalability – As traffic increases, a single container will not be enough. A load balancer lets us add more containers dynamically when needed, ensuring smooth performance.
Performance – Handling many requests in parallel prevents slowdowns. A load balancer efficiently distributes traffic to multiple containers, improving response times.

For that, we need an Elastic Load Balancing (ELB) to distribute requests properly.

AWS originally launched ELB with only Classic Load Balancers (CLB). Later, AWS completely redesigned its load balancing services and introduced the following in ElasticLoadBalancingV2:

Network Load Balancer (NLB);
Application Load Balancer (ALB);
Gateway Load Balancer (GLB).

Summary of differences: ALB vs. NLB vs. GLB (Image Source: AWS)

NLB is designed for high performance, low latency, and TCP/UDP traffic, which makes it perfect for situations like ours, where we are dealing with an Orchard Core CMS web app. NLB is optimised for handling millions of requests per second and is ideal for routing traffic to ECS containers.

ALB is usually better suited for HTTP/HTTPS traffic. ALB offers more advanced routing features for HTTP. Since we are mostly concerned with handling general traffic to ECS, NLB is simpler and more efficient.

GLB works well if we manage traffic between cloud and on-premises environments or across different regions, which does not apply to our use case here.

Configure NLB

Setting up an NLB in AWS always involves these three key components:

AWS::ElasticLoadBalancingV2::LoadBalancer;
AWS::ElasticLoadBalancingV2::TargetGroup;
AWS::ElasticLoadBalancingV2::Listener.

Firstly, LoadBalancer distributes traffic across multiple targets such as ECS tasks.

internalNlb:
  Type: AWS::ElasticLoadBalancingV2::LoadBalancer
  Properties:
    Name: !Sub "${ServiceName}-private-nlb"
    Scheme: internal
    Type: network
    Subnets:
      - !Ref privateSubnet1
      - !Ref privateSubnet2
    LoadBalancerAttributes:
      - Key: deletion_protection.enabled
        Value: "true"
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

In the template above, we create a NLB (Type: network) that is not exposed to the public internet (Scheme: internal). It is deployed across two private subnets, ensuring high availability. Finally, to prevent accidental deletion, we enable the deletion protection. In the future, we must disable it before we can delete the NLB.

Please take note that we do not enable Cross-Zone Load Balancing here because AWS charges for inter-AZ traffic. Also, since we are planning each AZ to have the same number of targets, disabling cross-zone helps preserve optimal routing.

Secondly, we need to setup TargetGroup to tell the NLB to send traffic to our ECS tasks running Orchard Core CMS.

nlbTargetGroup:
  Type: AWS::ElasticLoadBalancingV2::TargetGroup
  DependsOn:
    - internalNlb
  Properties:
    Name: !Sub "${ServiceName}-target-group"
    Port: 80
    Protocol: TCP
    TargetType: instance
    VpcId: !Ref vpc
    HealthCheckProtocol: HTTP
    HealthCheckPort: 80
    HealthCheckPath: /health
    TargetGroupAttributes:
      - Key: deregistration_delay.timeout_seconds
        Value: 10
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

Here, we indicate that the TargetGroup is listening on port 80 and expects TCP traffic. TargetType: instance means NLB will send traffic directly to EC2 instances that are hosting our ECS tasks. We also link it to our VPC to ensure traffic stays within our network.

Even though the NLB uses TCP at the transport layer, it performs health checks at the application layer (HTTP). This ensures that the NLB can intelligently route traffic only to instances that are responding correctly to the application-level health check endpoint. Our choice of HTTP for the health check protocol instead of TCP is because the Orchard Core running on ECS is listening on port 80 and exposing an HTTP health check endpoint /health. By using HTTP for health checks, we can ensure that the NLB can detect not only if the server is up but also if the Orchard Core is functioning correctly.

We also setup Deregistration Delay to be 10 seconds. Thus, when an ECS task is stopped or removed, the NLB waits 10 seconds before fully removing it. This helps prevent dropped connections by allowing any in-progress requests to finish. We can keep 10 for now if the CMS does not have long requests. However, when we start to notice 502/503 errors when deploying updates, we should increase it to 30 or more.

In addition, normally, a Target Group checks if the app is healthy before sending traffic.
Since NLB only supports TCP health checks and our Orchard Core app does not expose a TCP check, we skip health checks for now.

Thirdly, we need to configure the Listener. This Listener is responsible for handling incoming traffic on our NLB. When a request comes in, the Listener forwards the traffic to the Target Group, which then routes it to our ECS instances running Orchard Core CMS.

internalNlbListener:
  Type: AWS::ElasticLoadBalancingV2::Listener
  Properties:
    LoadBalancerArn: !Ref internalNlb
    Port: 80
    Protocol: TCP
    DefaultActions:
      - Type: forward
        TargetGroupArn: !Ref nlbTargetGroup

The Listener port is the entry point where the NLB receives traffic from. It is different from the TargetGroup port which is the port on the ECS instances where the Orchard Core app is actually running. The Listener forwards traffic from its port to the TargetGroup port. In most cases, they are the same for simplicity.

The DefaultActions section ensures that all incoming requests are automatically directed to the correct target without any additional processing. This setup allows our NLB to efficiently distribute traffic to the ECS tasks while keeping the configuration simple and scalable.

In the NLB setup above, have you noticed that we do not handle port 443 (HTTPS)? Right now, our setup only works with HTTP on port 80.

So, if users visit our Orchard Core with HTTPS, the request stays encrypted as it passes through the NLB. But here is the problem because that means our ECS task must be able to handle HTTPS itself. If our ECS tasks only listen on port 80, they will receive encrypted HTTPS traffic, which they cannot process.

So why not we configure Orchard Core to accept HTTPS directly by having it listen on port 443 in Program.cs? Sure! However, this would require our ECS tasks to handle SSL termination themselves. We thus need to manage SSL certificates ourselves, which adds complexity to our setup.

Hence, we need a way to properly handle HTTPS before it reaches ECS. Now, let’s see how we can solve this with API Gateway!

Unit 10: API Gateway

As we discussed earlier, not always, but it is best practice to offload SSL termination to API Gateway because NLB does not handle SSL decryption. The SSL termination happens automatically with API Gateway for HTTPS traffic. It is a built-in feature, so we do not have to worry about manually managing SSL certificates on our backend.

In addition, API Gateway brings extra benefits such as blocking unwanted traffic and ensures only the right users can access our services. It also caches frequent requests, reducing load on our backend. Finally, it is able to log all requests, making troubleshooting faster.

By using API Gateway, we keep our infrastructure secure, efficient, and easy to manage.

Let’s start with a basic setup of API Gateway with NLB by setting up the following required components:

AWS::ApiGateway::RestApi: The root API that ties everything together. It defines the API itself before adding resources and methods.
AWS::ApiGateway::VpcLink: Connects API Gateway to the NLB.
AWS::ApiGateway::Resource: Defines the API endpoint path.
AWS::ApiGateway::Method: Specifies how the API handles requests (e.g. GET, POST).
AWS::ApiGateway::Deployment: Deploys the API configuration.
AWS::ApiGateway::Stage: Assigns a stage (e.g. dev, prod) to the deployment.

Setup Rest API

API Gateway is like a front door to our backend services. Before we define any resources, methods, or integrations, we need to create this front door first, i.e. the AWS::ApiGateway::RestApi resource.

apiGatewayRestApi:
  Type: AWS::ApiGateway::RestApi
  Properties:
    Name: !Sub "${ServiceName}-api-gateway"
    DisableExecuteApiEndpoint: True
    EndpointConfiguration:
      Types:
        - REGIONAL
    Policy: ''

Here we disable the execute-api endpoint because we want to stop AWS from exposing a default execute-api endpoint. We want to enforce access through our own custom domain which we will setup later.

REGIONAL ensures that the API is available only within our AWS region. Setting it to REGIONAL is generally the recommended option for most apps, especially for our Orchard Core CMS, because both the ECS instances and the API Gateway are in the same region. This setup allows requests to be handled locally, which minimises latency. In the future, if our CMS user base grows and is distributed globally, we may need to consider switching to EDGE to serve our CMS to a larger global audience with better performance and lower latency across regions.

Finally, since this API is mainly acting as a reverse proxy to our Orchard Core homepage on ECS, CORS is not needed. We also leave Policy: '' empty means anyone can access the public-facing Orchard Core. Instead, security should be handled by the Orchard Core authentication.

Now that we have our root API, the next step is to connect it to our VPC using VpcLink!

Setup VPC Link

The VPC Link allows API Gateway to access private resources in our VPC, such as our ECS services via the NLB. This connection ensures that requests from the API Gateway can securely reach the Orchard Core CMS hosted in ECS, even though those resources are not publicly exposed.

In simple terms, VPC Link acts as a bridge between the public-facing API Gateway and the internal resources within our VPC.

So in our template, we define the VPC Link and specify the NLB as the target, which means that all API requests coming into the Gateway will be forwarded to the NLB, which will then route them to our ECS tasks securely.

apiGatewayVpcLink:
  Type: AWS::ApiGateway::VpcLink
  Description: "VPC link for API Gateway of Orchard Core"
  Properties:
    Name: !Sub "${ServiceName}-vpc-link"
    TargetArns:
      - !Ref internalNlb

Now that we have set up the VpcLink, which connects our API Gateway to our ECS, the next step is to define how requests will actually reach our ECS. That is where the API Gateway Resource comes into play.

Setup API Gateway Resource

For the API Gateway to know what to do with the incoming requests once they cross that VPC Link bridge, we need to define specific resources, i.e. the URL paths our users will use to access the Orchard Core CMS.

In our case, we use a proxy resource to catch all requests and send them to the backend ECS service. This lets us handle dynamic requests with minimal configuration, as any path requested will be forwarded to ECS.

Using proxy resource is particularly useful for web apps like Orchard Core CMS, where the routes could be dynamic and vary widely, such as /home, /content-item/{id}, /admin/{section}. With the proxy resource, we do not need to define each individual route or API endpoint in the API Gateway. As the CMS grows and new routes are added, we also will not need to constantly update the API Gateway configuration.

apiGatewayRootProxyResource:
  Type: AWS::ApiGateway::Resource
  Properties:
    RestApiId: !Ref apiGatewayRestApi
    ParentId: !GetAtt apiGatewayRestApi.RootResourceId
    PathPart: '{proxy+}'
  DependsOn:
    - apiGatewayRestApi

After setting up the resources and establishing the VPC link to connect API Gateway to our ECS instances, the next step is to define how we handle incoming requests to those resources. This is where the AWS::ApiGateway::Method comes in. It defines the specific HTTP methods that API Gateway should accept for a particular resource.

Setup Method

The Resource component above is used to define where the requests will go. However, just defining the path alone is not enough to handle incoming requests. We need to tell API Gateway how to handle requests that come to those paths. This is where the AWS::ApiGateway::Method component comes into play.

For a use case like hosting Orchard Core CMS, the following configuration can be a good starting point.

apiGatewayRootMethod:
  Type: AWS::ApiGateway::Method
  Properties:
    HttpMethod: ANY
    AuthorizationType: NONE
    ApiKeyRequired: False
    RestApiId: !Ref apiGatewayRestApi
    ResourceId: !GetAtt apiGatewayRestApi.RootResourceId
    Integration:
      ConnectionId: !Ref apiGatewayVpcLink
      ConnectionType: VPC_LINK
      Type: HTTP_PROXY
      IntegrationHttpMethod: ANY
      Uri: !Sub "http://${internalNlb.DNSName}"
  DependsOn:
    - apiGatewayRootProxyResource

apiGatewayRootProxyMethod:
  Type: AWS::ApiGateway::Method
  Properties:
    ApiKeyRequired: False
    RestApiId: !Ref apiGatewayRestApi
    ResourceId: !Ref apiGatewayRootProxyResource
    HttpMethod: ANY
    AuthorizationType: NONE
    RequestParameters:
      method.request.path.proxy: True
    Integration:
      ConnectionId: !Ref apiGatewayVpcLink
      ConnectionType: VPC_LINK
      Type: HTTP_PROXY
      RequestParameters:
        integration.request.path.proxy: method.request.path.proxy
      CacheKeyParameters:
        - method.request.path.proxy
      IntegrationHttpMethod: ANY
      IntegrationResponses:
        - StatusCode: 200
          SelectionPattern: 200
      Uri: !Sub "http://${internalNlb.DNSName}/{proxy}"
  DependsOn:
    - apiGatewayRootProxyResource
    - apiGatewayVpcLink

By setting up both the root method and the proxy method, the API Gateway can handle both general traffic via the root method and dynamic path-based traffic via the proxy method in a flexible way. This reduces the need for additional methods and resources to manage various paths.

Handling dynamic path-based traffic for Orchard Core via the proxy method.

Since Orchard Core is designed for browsing, updating, and deleting content, as a start, we may need support for multiple HTTP methods. By using ANY, we are ensuring that all these HTTP methods are supported without having to define separate methods for each one.

Setting AuthorizationType to NONE is a good starting point, especially in cases where we are not expecting to implement authentication directly at the API Gateway level. Instead, we are relying on Orchard Core built-in authentication module, which already provides user login, membership, and access control. Later, if needed, we can enhance security by adding authentication layers at the API Gateway level, such as AWS IAM, Cognito, or Lambda authorisers.

Similar to the authorisation, setting ApiKeyRequired to False is also a good choice for a starting point, especially since we are not yet exposing a public API. The setup above is primarily for routing requests to Orchard Core CMS. We could change if we need to secure our CMS API endpoints in the future when 3rd-party integrations or external apps need access to the CMS API.

Up to this point, API Gateway has a Resource and a Method, but it still does not know where to send the request. That is where Integration comes in. In our setup above, it tells API Gateway to use VPC Link to talk to the ECS. It also makes API Gateway act as a reverse proxy by setting Type to HTTP_PROXY. It will simply forward all types of HTTP requests to Orchard Core without modifying them.

Even though API Gateway enforces HTTPS for external traffic, it decrypts (aka terminates SSL), validates the request, and then forwards it over HTTP to NLB within the AWS private network. Since this internal communication happens securely inside AWS, the Uri is using HTTP.

After setting up the resources and methods in API Gateway, we are essentially defining the blueprint for our API. However, these configurations are only in a draft state so they are not yet live and accessible to our end-users. We need a step called Deployment to publish the configuration.

Setup Deployment

Without deploying, the changes we discussed above are just concepts and plans. We can test them within CloudFormation, but they will not be real in the API Gateway until they are deployed.

There is an important thing to take note is that API Gateway does not automatically detect changes in our CloudFormation template. If we do not create a new deployment, our changes will not take effect in the live environment. So, we must force a new deployment by changing something in AWS::ApiGateway::Deployment.

Another thing to take note is that a new AWS::ApiGateway::Deployment will not automatically be triggered when we update our API Gateway configurations unless the logical ID of the deployment resource itself changes. This means that every time we make changes to our API Gateway configurations, we need to manually change the logical ID of the AWS::ApiGateway::Deployment. The reason CloudFormation does not automatically redeploy is to avoid unnecessary changes or disruptions.

apiGatewayDeployment202501011048:
  Type: AWS::ApiGateway::Deployment
  Properties:
    RestApiId: !Ref apiGatewayRestApi
  DependsOn:
    - apiGatewayRootMethod

In the template above, we append a timestamp 202501011048 to the logical ID of the Deployment. This way, even if we make multiple deployments on the same day, each will have a unique logical ID due to the timestamp.

Deployment alone does not make our API available to the users. We still need to assign it to a specific Stage to ensure it has a versioned endpoint with all configurations applied.

Setup Stage

A Stage in API Gateway is a deployment environment that allows us to manage and control different versions of our API. It acts as a live endpoint for clients to interact with our API. Without a Stage, the API exists but is not publicly available. We can create stages like dev, test, and prod to separate development and production traffic.

apiGatewayStage:
  Type: AWS::ApiGateway::Stage
  Properties:
    StageName: !Ref ApiGatewayStageName
    RestApiId: !Ref apiGatewayRestApi
    DeploymentId: !Ref apiGatewayDeployment202501011048
    MethodSettings:
      - ResourcePath: '/*'
        HttpMethod: '*'
        ThrottlingBurstLimit: 100
        ThrottlingRateLimit: 50
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

For now, we will use production as the default stage name to keep things simple. This will help us get everything set up and running quickly. Once we are ready for more environments, we can easily update the ApiGatewayStageName in the Parameters based on our environment setup.

MethodSettings are configurations defining how requests are handled in terms of performance, logging, and throttling. Using /* and * is perfectly fine at the start as our goal is to apply global throttling and logging settings for all our Orchard Core routes in one go. However, in the future we might want to adjust the settings as follows:

Content Modification (POST, PUT, DELETE): Stricter throttling and more detailed logging.
Content Retrieval (GET): More relaxed throttling for GET requests since they are usually read-only and have lower impact.

Having a burst and rate limit is useful for protecting our Orchard Core backend from excessive traffic. Even if we have a CMS with predictable traffic patterns, having rate limiting helps to prevent abuse and ensure fair usage.

The production stage in our API Gateway.

Unit 11: Route53 for API Gateway

Now that we have successfully set up API Gateway, it is accessible through an AWS-generated URL, i.e. something like https://xxxxxx.execute-api.ap-southeast-5.amazonaws.com/production which is functional but not user-friendly. Hence, we need to setup a custom domain for it so that it easier to remember, more professional, and consistent with our branding.

AWS provides a straightforward way to implement this using two key configurations:

AWS::ApiGateway::DomainName – Links our custom domain to API Gateway.
AWS::ApiGateway::BasePathMapping – Organises API versions and routes under the same domain.

Setup Hosted Zone and DNS

Since I have my domain on GoDaddy, I will need to migrate DNS management to AWS Route 53 by creating a Hosted Zone.

My personal hosted zone: chunlinprojects.com.

After creating a Hosted Zone in AWS, we need to manually copy the NS records to GoDaddy. This step is manual anyway, so we will not be automating this part of setup in CloudFormation. In addition, hosted zones are sensitive resources and should be managed carefully. We do not want hosted zones to be removed when our CloudFormation stacks are deleted too.

Once the switch is done, we can go back to our CloudFormation template to setup the custom domain name for our API Gateway.

Setup Custom Domain Name for API Gateway

API Gateway requires an SSL/TLS certificate to use a custom domain.

apiGatewayCustomDomainCert:
  Type: AWS::CertificateManager::Certificate
  Properties:
    DomainName: !Ref HostedZoneName
    ValidationMethod: 'DNS'
    DomainValidationOptions:
      - DomainName: !Sub "${CmsHostname}.{HostedZoneName}"
        HostedZoneId: !Ref HostedZoneId

Take note that please update the DomainNames in the template above to use your domain name. Also, the HostedZoneId can be retrieved from the AWS Console under “Hosted zone details” in the screenshot above.

In the resource, DomainValidationOptions tells CloudFormation to use DNS validation. When we use the AWS::CertificateManager::Certificate resource in a CloudFormation stack, domain validation is handled automatically if all three of the following are true:

We are using DNS validation;
The certificate domain is hosted in Amazon Route 53;
The domain resides in our AWS account.

However, if the certificate uses email validation, or if the domain is not hosted in Route 53, then the stack will remain in the CREATE_IN_PROGRESS state. Here, we will show how we can log in to AWS Console to manually set up DNS validation.

Remember to log in to AWS Console to check for ACM Certificate Status.

After that, we need to choose the Create records in Route 53 button to create records. The Certificate status page should open with a status banner reporting Successfully created DNS records. According to the documentation, our new certificate might continue to display a status of Pending validation for up to 30 minutes.

Now that the SSL certificate is ready and the DNS validation is done, we will need to link the SSL certificate to our API Gateway using a custom domain. We are using RegionalCertificateArn, which is intended for a regional API Gateway.

apiGatewayCustomDomainName:
  Type: AWS::ApiGateway::DomainName
  Properties:
    RegionalCertificateArn: !Ref apiGatewayCustomDomainCert
    DomainName: !Sub "${CmsHostname}.{HostedZoneName}"
    EndpointConfiguration:
      Types:
        - REGIONAL
    SecurityPolicy: TLS_1_2

This allows our API to be securely accessed using our custom domain. We also set up a SecurityPolicy to use the latest TLS version (TLS 1.2), ensuring that the connection is secure and follows modern standards.

Even though it is optional, it is a good practice to specify the TLS version for both security and consistency, especially for production environments. Enforcing a TLS version helps avoid any potential vulnerabilities from outdated protocols.

Setup Custom Domain Routing

Next, we need to create a base path mapping to map the custom domain to our specific API stage in API Gateway.

The BasePathMapping is the crucial bridge between our custom domain and our API Gateway because when users visit our custom domain, we need a way to tell AWS API Gateway which specific API and stage should handle the incoming requests for that domain.

apiGatewayCustomDomainBasePathMapping:
  Type: AWS::ApiGateway::BasePathMapping
  Properties:
    DomainName: !Ref apiGatewayCustomDomainName
    RestApiId: !Ref apiGatewayRestApi
    Stage: !Ref apiGatewayStage

While the BasePathMapping connects our custom domain to a specific stage inside our API Gateway, we need to setup DNS routing outside AWS which handles the DNS resolution.

The RecordSet creates a DNS record (typically an A or CNAME record) that points to the API Gateway endpoint. Without this record, DNS systems outside AWS will not know where to direct traffic for our custom domain.

apiGatewayCustomDomainARecord:
  Type: AWS::Route53::RecordSet
  Properties:
    HostedZoneName: !Sub "${HostedZoneName}."
    Name: !Sub "${CmsHostname}.{HostedZoneName}"
    Type: A
    AliasTarget:
      DNSName: !GetAtt apiGatewayCustomDomainName.RegionalDomainName
      HostedZoneId: !GetAtt apiGatewayCustomDomainName.RegionalHostedZoneId

There is one interesting stuff to take note here is that when we use an AWS::Route53::RecordSet that specifies HostedZoneName, we must include a trailing dot (for example, chunlinprojects.com.) as part of the HostedZoneName. Otherwise, we can also choose to specify HostedZoneId instead, but never specifying both.

For API Gateway with a custom domain, AWS recommends using an Alias Record (which is similar to an A record) instead of a CNAME because the endpoint for API Gateway changes based on region and the nature of the service.

Alias records are a special feature in AWS Route 53 designed for pointing domain names directly to AWS resources like API Gateway, ELB, and so on. While CNAME records are often used in DNS to point to another domain, Alias records are unique to AWS and allow us to avoid extra DNS lookup costs.

For the HostedZoneId of AliasTarget, it is the Route 53 Hosted Zone ID of the API Gateway, do not mess up with the ID of our own hosted zone in Route 53.

Finally, please take note that when we are creating an alias resource record set, we need to omit TTL.

Reference 01: ECS Cluster

As we move forward with hosting Orchard Core CMS, let’s go through a few hosting options available within AWS, as listed below.

EC2 (Elastic Compute Cloud): A traditional option for running virtual machines. We can fully control the environment but need to manage everything, from scaling to OS patching;
Elastic Beanstalk: PaaS optimised for traditional .NET apps on Windows/IIS, not really suitable for Orchard Core which runs best on Linux containers with Kestrel;
Lightsail: A traditional VPS (Virtual Private Server), where we manage the server and applications ourselves. It is a good fit for simple, low-traffic websites but not ideal for scalable workloads like Orchard Core CMS.
EKS (Elastic Kubernetes Service): A managed Kubernetes offering from AWS. It allows us to run Kubernetes clusters, which are great for large-scale apps with complex micro-services. However, managing Kubernetes adds complexity.
ECS (Elastic Container Service): A service designed for running containerised apps. We can run containers on serverless Fargate or EC2-backed clusters.

The reason why we choose ECS is because it offers a scalable, reliable, and cost-effective way to deploy Orchard Core in a containerised environment. ECS allows us to take advantage of containerisation benefits such as isolated, consistent deployments and easy portability across environments. With built-in support for auto-scaling and seamless integration with AWS services like RDS for databases, S3 for media storage, and CloudWatch for monitoring, ECS ensures high availability and performance.

In ECS, we can choose to use either Fargate or EC2-backed ECS for hosting Orchard Core, depends on our specific needs and use case. For highly customised, predictable, or resource-intensive workloads CMS, EC2-based ECS might be more appropriate due to the need for fine-grained control over resources and configurations.

Official documentation with CloudFormation template on how to setup an ECS cluster.

There is an official documentation on how to an setup ECS cluster. Hence, we will not discuss in depth about how to set it up. Instead, we will focus on some of the key points that we need to take note of.

While we can technically use any Linux AMI for running ECS tasks, the Amazon ECS-Optimised AMI offers several key benefits and optimisations that make it a better choice, particularly for ECS workloads. The Amazon ECS-Optimised AMI is designed and optimised by AWS to run ECS tasks efficiently on EC2 instances. By using the ECS-Optimised AMI, we benefit from pre-installed ECS agent + Docker as well as optimised configuration for ECS. Those AMI look for agent configuration data in the /etc/ecs/ecs.config file when the container agent starts. That’s why can specify this configuration data at launch with Amazon EC2 user data, as shown below.

containerInstances:
  Type: AWS::EC2::LaunchTemplate
  Properties:
    LaunchTemplateName: "asg-launch-template"
    LaunchTemplateData:
      ImageId: !Ref EcsAmi
      InstanceType: "t3.large"
      IamInstanceProfile:
        Name: !Ref ec2InstanceProfile
      SecurityGroupIds:
        - !Ref ecsContainerHostSecurityGroup
      # This injected configuration file is how the EC2 instance
      # knows which ECS cluster it should be joining
      UserData:
        Fn::Base64: !Sub |
         #!/bin/bash -xe
         echo "ECS_CLUSTER=core-cluster" >> /etc/ecs/ecs.config
      # Disable IMDSv1, and require IMDSv2
      MetadataOptions:
        HttpEndpoint: enabled
        HttpTokens: required

As shown in the above CloudFormation template, instead of hardcoding an AMI ID which will become outdated over time, we have a parameter to ensure that the cluster always provisions instances using the most recent Amazon Linux 2023 ECS-optimised AMI.

EcsAmi:
  Description: The Amazon Machine Image ID used for the cluster
  Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
  Default: /aws/service/ecs/optimized-ami/amazon-linux-2023/recommended/image_id

Also, the EC2 instances need access to communicate with the ECS service endpoint. This can be through an interface VPC endpoint or through our EC2 instances having public IP addresses. In our case, we are placing our EC2 instances in private subnets, so we use the Network Address Translation (NAT) to provide this access.

ecsNatGateway:
  Type: AWS::EC2::NatGateway
  Properties:
    AllocationId: !GetAtt ecsEip.AllocationId
    SubnetId: !Ref publicSubnet1

Unit 12: ECS Task Definition and Service

This ECS cluster definition is just the starting point. Next, we will define how the containers run and interact through AWS::ECS::TaskDefinition.

ecsTaskDefinition:
  Type: AWS::ECS::TaskDefinition
  Properties:
    Family: !Ref ServiceName
    TaskRoleArn: !GetAtt iamRole.Arn
    ContainerDefinitions:
      - Name: !Ref ServiceName
        Image: !Ref OrchardCoreImage
        LogConfiguration:
          LogDriver: awslogs
          Options:
            awslogs-group: !Sub "/ecs/${ServiceName}-log-group"
            awslogs-region: !Ref AWS::Region
            awslogs-stream-prefix: ecs
        PortMappings:
          - ContainerPort: 5000
            HostPort: 80
            Protocol: tcp
        Cpu: 256
        Memory: 1024
        MemoryReservation: 512
        Environment:
          - Name: DatabaseEndpoint
            Value: !GetAtt cmsDBInstance.Endpoint.Address
        Essential: true
        HealthCheck:
          Command:
            - CMD-SHELL
            - "wget -q --spider http://localhost:5000/health || exit 1"
          Interval: 30
          Timeout: 5
          Retries: 3
          StartPeriod: 30

In the setup above, we are sending logs to CloudWatch Logs so that we can centralise logs from all ECS tasks, making it easier to monitor and troubleshoot our containers.

By default, ECS is using bridge network mode. In bridge mode, containers do not get their own network interfaces. Instead, the container port (5000) must be mapped to a port on the host EC2 instance (80). Without this mapping, the Orchard Core on EC2 would not be reachable from outside. The reason we set the ContainerPort: 5000 in is to match the port our Orchard Core app is exposed on within the Docker container.

As CMS platforms like Orchard Core generally require more memory for smooth operations, especially in production environments with more traffic, it is better to start with a CPU allocation like 256 (0.25 vCPU) and 1024 MB for memory, depending on expected load.

For the MemoryReservation which is a guaranteed amount of memory for our container, we set it to be 512 MB of memory. By reserving memory, we are ensuring that your container has enough memory to run reliably. Orchard Core, being a modular CMS, can consume more memory depending on the number of features/modules you have enabled. Later if we realise Orchard Core does not need that much guaranteed memory, we can leave MemoryReservation lower. The key idea is to reserve enough memory to ensure stable operations without overcommitting.

Next, we have Essential where we set it to true. This property specifies whether the container is essential to the ECS task. We set it to true so that ECS will treat this Orchard Core container as vital for the task. If the container stops or fails, ECS will stop the entire task. Otherwise, ECS will not automatically stop the task if this Orchard Core container fails, which could lead to issues, especially in a production environment.

Finally, we must not forget about HealthCheck. In most web apps like Orchard Core, a simple HTTP endpoint /health is normally used as a health check. Here, we need to understand that many minimal container images like ECS-optimised AMIs do not include curl by default to keep them lightweight. However, wget is often available by default, making it a good alternative for checking if an HTTP endpoint is reachable. Hence, in the template above, ECS is using wget to check the /health endpoint on port 5000. If it receives an error, the container is considered unhealthy.

We can test locally to check if `curl` or `wget` is available in the image.

Once the TaskDefinition is set up, it defines the container specs. However, the ECS service is needed to manage how and where the task runs within the ECS cluster. We need the ECS service tells ECS how to run the task, manage it, and keep it running smoothly.

ecsService:
  Type: AWS::ECS::Service
  DependsOn:
    - iamRole
    - internalNlb
    - nlbTargetGroup
    - internalNlbListener
  Properties:
    Cluster: !Ref ecsCluster
    DesiredCount: 2
    DeploymentConfiguration:
      MaximumPercent: 200
      MinimumHealthyPercent: 50
    LoadBalancers:
      - ContainerName: !Ref ServiceName
        ContainerPort: 5000
        TargetGroupArn: !Ref nlbTargetGroup
    PlacementStrategies:
      - Type: spread
        Field: attribute:ecs.availability-zone
      - Type: spread
        Field: instanceId
    TaskDefinition: !Ref ecsTaskDefinition
    ServiceName: !Ref ServiceName
    Role: !Sub "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS"
    HealthCheckGracePeriodSeconds: 60

The DesiredCount is the number of tasks (or containers) we want ECS to run at all times for Orchard Core app. In this case, we set it to 2 which means that ECS will try to keep exactly 2 tasks running for our service. Setting it to 2 helps ensure that we have redundancy. If one task goes down, the other task can continue serving, ensuring that our CMS stays available and resilient.

Based on the number of DesiredCount, we indicate that during deployment, ECS can temporarily run up to 4 tasks (MaximumPercent: 200) and at least 1 task (MinimumHealthyPercent: 50) must be healthy during updates to ensure smooth deployment.

The LoadBalancers section in the ECS service definition is where we link our service to the NLB that we set up earlier, ensuring that the NLB will distribute the traffic to the correct tasks running within the ECS service. Also, since our container is configured to run on port 5000 as per our Dockerfile, this is the port we use.

Next, we have PlacementStrategies to help us control how our tasks are distributed across different instances and availability zones, making sure our CMS is resilient and well-distributed. Here, attribute:ecs.availability-zone ensures the tasks are spread evenly across different availability zones within the same region. At the same time, Field: instanceId ensures that our tasks are spread across different EC2 instances within the cluster.

Finally, it is a good practice to set a HealthCheckGracePeriodSeconds to give our containers some time to start and become healthy before ECS considers them unhealthy during scaling or deployments.

Unit 13: CloudWatch Alarm

To ensure we effectively monitor the performance of Orchard Core on our ECS service, we also need to set up CloudWatch alarms to track metrics like CPU utilisation, memory utilisation, health check, running task count, etc.

We set up the following CloudWatch alarm to monitor CPU utilisation for our ECS service. This alarm triggers if the CPU usage exceeds 75% for a specified period (5 minutes). By doing this, we can quickly identify when our service is under heavy load, which helps us take action to prevent performance issues.

highCpuUtilizationAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmName: !Sub "${AWS::StackName}-high-cpu"
    AlarmDescription: !Sub "ECS service ${AWS::StackName}: Cpu utilization above 75%"
    Namespace: AWS/ECS
    MetricName: CPUUtilization
    Dimensions:
      - Name: ClusterName
        Value: !Ref ecsCluster
      - Name: ServiceName
        Value: !Ref ServiceName
    Statistic: Average
    Period: 60
    EvaluationPeriods: 5
    Threshold: 75
    ComparisonOperator: GreaterThanOrEqualToThreshold
    TreatMissingData: notBreaching
    ActionsEnabled: true
    AlarmActions: []
    OKActions: []

Even if we leave AlarmActions and OKActions as empty arrays, the alarm state will still be visible in the AWS CloudWatch Console. We can monitor the alarm state directly on the CloudWatch dashboard.

Similar to the CPU utilisation alarm above, we have another alarm to trigger when the count of running tasks is 0 (less than 1) for 5 consecutive periods, indicating that there have been no running tasks for a full 5 minutes.

noRunningTasksAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmName: !Sub "${AWS::StackName}-no-task"
    AlarmDescription: !Sub "ECS service ${AWS::StackName}: No running ECS tasks for more than 5 mins"
    Namespace: AWS/ECS
    MetricName: RunningTaskCount
    Dimensions:
      - Name: ClusterName
        Value: !Ref ecsCluster
      - Name: ServiceName
        Value: !Ref ServiceName
    Statistic: Average
    Period: 60
    EvaluationPeriods: 5
    Threshold: 1
    ComparisonOperator: LessThanThreshold
    TreatMissingData: notBreaching
    ActionsEnabled: true
    AlarmActions: []
    OKActions: []

The two alarms are available on CloudWatch dashboard.

By monitoring these key metrics, we can proactively address any performance or availability issues, ensuring our Orchard Core CMS runs smoothly and efficiently.

Wrap-Up

Setting up Orchard Core on ECS with CloudFormation does have its complexities, especially with the different moving parts like API Gateway, load balancers, and domain configurations. However, once we have the infrastructure defined in CloudFormation, it becomes much easier to deploy, update, and manage our AWS environment. This is one of the key benefits of using CloudFormation, as it gives us consistency, repeatability, and automation in our deployments.

Orchard Core website is up and accessible via our custom domain!

The heavy lifting is done up front, and after that, it is mostly about making updates to our CloudFormation stack and redeploying without having to worry about manually reconfiguring everything.

When Pinecone Wasn’t Enough: My Journey to pgvector

January 20, 2025January 20, 2025 by Chun Lin, posted in Cloud Computing: Amazon Web Services (AWS), Experience, Pinecone, PostgreSQL, Python, SQL

If you work with machine learning or natural language processing, you have probably dealt with storing and searching through vector embeddings.

When I created the Honkai: Star Rail (HSR) relic recommendation system using Gemini, I started with Pinecone. Pinecone is a managed vector database that made it easy to index relic descriptions and character data as embeddings. It helped me find the best recommendations based on how similar they were.

Pinecone worked well, but as the project grew, I wanted more control, something open-source, and a cheaper option. That is when I found pgvector, a tool that adds vector search to PostgreSQL and gives the flexibility of an open-source database.

About HSR and Relic Recommendation System

Honkai: Star Rail (HSR) is a popular RPG that has captured the attention of players worldwide. One of the key features of the game is its relic system, where players equip their characters with relics like hats, gloves, or boots to boost stats and unlock special abilities. Each relic has unique attributes, and selecting the right sets of relics for a character can make a huge difference in gameplay.

An HSR streamer, Unreal Dreamer, learning the new relic feature. (Image Source: Unreal Dreamer YouTube)

As a casual player, I often found myself overwhelmed by the number of options and the subtle synergies between different relic sets. Finding the good relic combination for each character was time-consuming.

This is where LLMs like Gemini come into play. With the ability to process and analyse complex data, Gemini can help players make smarter decisions.

In November 2024, I started a project to develop a Gemini-powered HSR relic recommendation system which can analyse a player’s current characters to suggest the best options for them. In the project, I have been storing embeddings in Pinecone.

Embeddings and Vector Database

An embedding is a way to turn data, like text or images, into a list of numbers called a vector. These vectors make it easier for a computer to compare and understand the relationships between different pieces of data.

For example, in the HSR relic recommendation system, we use embeddings to represent descriptions of relic sets. The numbers in the vector capture the meaning behind the words, so similar relics and characters have embeddings that are closer together in a mathematical sense.

This is where vector databases like Pinecone or pgvector come in. Vector databases are designed for performing fast similarity searches on large collections of embeddings. This is essential for building systems that need to recommend, match, or classify data.

pgvector is an open-source extension for PostgreSQL that allows us to store and search for vectors directly in our database. It adds specialised functionality for handling vector data, like embeddings in our HSR project, making it easier to perform similarity searches without needing a separate system.

Unlike managed services like Pinecone, pgvector is open source. This meant we could use it freely and avoid vendor lock-in. This is a huge advantage for developers.

Finally, since pgvector runs on PostgreSQL, there is no need for additional managed service fees. This makes it a budget-friendly option, especially for projects that need to scale without breaking the bank.

Choosing the Right Model

While the choice of the vector database is important, it is not the key factor in achieving great results. The quality of our embeddings actually is determined by the model we choose.

For my HSR relic recommendation system, when our embeddings were stored in Pinecone, I started by using the multilingual-e5-large model from Microsoft Research offered in Pinecone.

When I migrated to pgvector, I had the freedom to explore other options. For this migration, I chose the all-MiniLM-L6-v2 model hosted on Hugging Face, which is a lightweight sentence-transformer designed for semantic similarity tasks. Switching to this model allowed me to quickly generate embeddings for relic sets and integrate them into pgvector, giving me a solid starting point while leaving room for future experimentation.

Using all-MiniLM-L6-v2 Model

Once we have decided to use the all-MiniLM-L6-v2 model, the next step is to generate vector embeddings for the relic descriptions. This model is from the sentence-transformers library, so we first need to install the library.

pip install sentence-transformers

The library offers SentenceTransformer class to load pre-trained models.

from sentence_transformers import SentenceTransformer

model_name = 'all-MiniLM-L6-v2'
model = SentenceTransformer(model_name)

At this point, the model is ready to encode text into embeddings.

The SentenceTransformer model takes care of tokenisation and other preprocessing steps internally, so we can directly pass text to it.

# Function to generate embedding for a single text
def generate_embedding(text):
    # No need to tokenise separately, it's done internally
    # No need to average the token embeddings
    embeddings = model.encode(text) 

    return embeddings

In this function, when we call model.encode(text), the model processes the text through its transformer layers, generating an embedding that captures its semantic meaning. The output is already optimised for tasks like similarity search.

Setting up the Database

After generating embeddings for each relic sets using the all-MiniLM-L6-v2 model, the next step is to store them in the PostgreSQL database with the pgvector extension.

For developers using AWS, there is a good news. In May 2023, AWS announced that Amazon Relational Database Service (RDS) for PostgreSQL would be supporting pgvector. In November 2024, Amazon RDS started to support pgvector 0.8.0.

pgvector is now supported on Amazon RDS for PostgreSQL.

To install the extension, we will run the following command in our database. This will introduce a new datatype called VECTOR.

CREATE EXTENSION vector;

After this, we can define our table as follows.

CREATE TABLE IF NOT EXISTS embeddings (
    id TEXT PRIMARY KEY,
    vector VECTOR(384),
    text TEXT
);

Besides the id column which is for the unique identifier, there are two other columns that are important.

The text column stores the original text for each relic (the two-piece and four-piece bonus descriptions).

The vector column stores the embeddings. The VECTOR(384) type is used to store embeddings, and 384 here refers to the number of dimensions in the vector. In our case, the embeddings generated by the all-MiniLM-L6-v2 model are 384-dimensional, meaning each embedding will have 384 numbers.

Here, a dimension refers to one of the “features” that helps describe something. When we talk about vectors and embeddings, each dimension is just one of the many characteristics used to represent a piece of text. These features could be things like the type of words used, their relationships, and even the overall meaning of the text.

Updating the Database

After the table is created, we can proceed to create INSERT INTO SQL statements to insert the embeddings and their associated text into the database.

In this step, I load the relic information from a JSON file and process it.

import json

# Load your relic set data from a JSON file
with open('/content/hsr-relics.json', 'r') as f:
    relic_data = json.load(f)

# Prepare data
relic_info_data = [
    {"id": relic['name'], "text": relic['two_piece'] + " " + relic['four_piece']}  # Combine descriptions
    for relic in relic_data
]

The relic_info_data will then be passed to the following function to generate the INSERT INTO statements.

# Function to generate INSERT INTO statements with vectors
def generate_insert_statements(data):
    # Initialise list to store SQL statements
    insert_statements = []

    for record in data:
        # Extracting text and id from the record
        id = record.get('id')
        text = record.get('text')

        # Generate the embedding for the text
        embedding = generate_embedding(text)
        
        # Convert the embedding to a list
        embedding_list = embedding.tolist()
        
        # Create the SQL INSERT INTO statement
        sql_statement = f"""
        INSERT INTO embeddings (id, vector, text)
        VALUES (
          '{id.replace("'", "''")}', 
          ARRAY{embedding_list}, 
          '{text.replace("'", "''")}')
        ON CONFLICT (id) DO UPDATE
        SET vector = EXCLUDED.vector, text = EXCLUDED.text;
        """
        
        # Append the statement to the list
        insert_statements.append(sql_statement)

    return insert_statements

The embeddings of the relic sets are successfully inserted to the database.

How It All Fits Together: Query the Database

Once we have stored the vector embeddings of all the relic sets in our PostgreSQL database, the next step is to find the relic sets that are most similar to a given character’s relic needs.

Just like what we have done for storing relic set embeddings, we need to generate an embedding for the query describing the character’s relic needs. This is done by passing the query through the model as demonstrated in the following code.

def query_similar_embeddings(query_text):
    query_embedding = generate_embedding(query_text)

    return query_embedding.tolist()

The generated embedding is an array of 384 numbers. We simply use this array in our SQL query below.

SELECT id, text, vector <=> '[<embedding here>]' AS distance
FROM embeddings
ORDER BY distance
LIMIT 3;

The key part of the query is the <=> operator. This operator calculates the “distance” between two vectors based on cosine similarity. In our case, it measures how similar the query embedding is to each stored embedding. The smaller the distance, the more similar the embeddings are.

We use LIMIT 3 to get the top 3 most similar relic sets.

Test Case: Finding Relic Sets for Gallagher

Gallagher is a Fire and Abundance character in HSR. He is a sustain unit that can heal allies by inflicting a debuff on the enemy.

According to the official announcement, Gallagher is a healer. (Image Source: Honkai: Star Rail YouTube)

The following screenshot shows the top 3 relic sets which are closely related to a HSR character called Gallagher using the query “Suggest the best relic sets for this character: Gallagher is a Fire and Abundance character in Honkai: Star Rail. He can heal allies.”

The returned top 3 relic sets are indeed recommended for Gallagher.

One of the returned relic sets is called the “Thief of Shooting Meteor”. It is the official recommended relic set in-game, as shown in the screenshot below.

Gallagher’s official recommended relic set.

Future Work

In our project, we will not be implementing indexing because currently in HSR, there are only a small number of relic sets. Without an index, PostgreSQL will still perform vector similarity searches efficiently because the dataset is small enough that searching through it directly will not take much time. For small-scale apps like ours, querying the vector data directly is both simple and fast.

However, when our dataset grows larger in the future, it is a good idea to explore indexing options, such as the ivfflat index, to speed up similarity searches.

References

Configure Portable Object: Localisation in .NET 8 Web API

December 17, 2024January 6, 2025 by Chun Lin, posted in ASP.NET, C#, Experience, Orchard Core

Localisation is an important feature when building apps that cater to users from different countries, allowing them to interact with our app in their native language. In this article, we will walk you through how to set up and configure Portable Object (PO) Localisation in an ASP.NET Core Web API project.

Localisation is about adapting the app for a specific culture or language by translating text and customising resources. It involves translating user-facing text and content into the target language.

While .NET localisation normally uses resource files (.resx) to store localised texts for different cultures, Portable Object files (.po) are another popular choice, especially in apps that use open-source tools or frameworks.

About Portable Object (PO)

PO files are a standard format used for storing localised text. They are part of the gettext localisation framework, which is widely used across different programming ecosystems.

A PO file contains translations in the form of key-value pairs, where:

Key: The original text in the source language.
Value: The translated text in the target language.

Because PO files are simple, human-readable text files, they are easily accessible and editable by translators. This flexibility makes PO files a popular choice for many open-source projects and apps across various platforms.

You might wonder why should we use PO files instead of the traditional .resx files for localisation? Here are some advantages of using PO files instead of .resx files:

Unlike .resx files, PO files have built-in support for plural forms. This makes it much easier to handle situations where the translation changes based on the quantity, like “1 item” vs. “2 items.”
While .resx files require compilation, PO files are plain text files. Hence, we do not need any special tooling or complex build steps to use PO files.
PO files work great with collaborative translation tools. For those who are working with crowdsourcing translations, they will find that PO files are much easier to manage in these settings.

SHOW ME THE CODE!

The complete source code of this project can be found at https://github.com/goh-chunlin/Experiment.PO.

Project Setup

Let’s begin by creating a simple ASP.NET Web API project. We can start by generating a basic template with the following command.

dotnet new webapi

This will set up a minimal API with a weather forecast endpoint.

The default /weatherforecast endpoint generated by .NET Web API boilerplate.

The default endpoint in the boilerplate returns a JSON object that includes a summary field. This field describes the weather using terms like freezing, bracing, warm, or hot. Here’s the array of possible summary values:

var summaries = new[]
{
    "Freezing", "Bracing", "Chilly", "Cool", 
    "Mild", "Warm", "Balmy", "Hot", "Sweltering", "Scorching"
};

As you can see, currently, it only supports English. To extend support for multiple languages, we will introduce localisation.

Prepare PO Files

Let’s start by adding a translation for the weather summary in Chinese. Below is a sample PO file that contains the Chinese translation for the weather summaries.

#: Weather summary (Chinese)
msgid "weather_Freezing"
msgstr "寒冷"

msgid "weather_Bracing"
msgstr "冷冽"

msgid "weather_Chilly"
msgstr "凉爽"

msgid "weather_Cool"
msgstr "清爽"

msgid "weather_Mild"
msgstr "温和"

msgid "weather_Warm"
msgstr "暖和"

msgid "weather_Balmy"
msgstr "温暖"

msgid "weather_Hot"
msgstr "炎热"

msgid "weather_Sweltering"
msgstr "闷热"

msgid "weather_Scorching"
msgstr "灼热"

In most cases, PO file names are tied to locales, as they represent translations for specific languages and regions. The naming convention typically includes both the language and the region, so the system can easily identify and use the correct file. For example, the PO file above should be named zh-CN.po, which represents the Chinese translation for the China region.

In some cases, if our app supports a language without being region-specific, we could have a PO file named only with the language, such as ms.po for Malay. This serves as a fallback for all Malay speakers, regardless of their region.

We have prepared three Malay PO files: one for Malaysia (`ms-MY.po`), one for Singapore (`ms-SG.po`), and one fallback file (`ms.po`) for all Malay speakers, regardless of region.

After that, since our PO files are placed in the Localisation folder, please do not forget to include them in the .csproj file, as shown below.

<Project Sdk="Microsoft.NET.Sdk.Web">

  ...

  <ItemGroup>
    <Folder Include="Localisation\" />
    <Content Include="Localisation\**">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    </Content>
  </ItemGroup>

</Project>

Adding this <ItemGroup> ensures that the localisation files from the Localisation folder are included in our app output. This helps the application find and use the proper localisation resources when running.

Configure Localisation Option in .NET

In an ASP .NET Web API project, we have to install a NuGet library from Orchard Core called OrchardCore.Localization.Core (Version 2.1.3).

Once the package is installed, we need to tell the application where to find the PO files. This is done by configuring the localisation options in the Program.cs file.

builder.Services.AddMemoryCache();
builder.Services.AddPortableObjectLocalization(options => 
    options.ResourcesPath = "Localisation");

The AddMemoryCache method is necessary here because LocalizationManager of Orchard Core uses the IMemoryCache service. This caching mechanism helps avoid repeatedly parsing and loading the PO files, improving performance by keeping the localised resources in memory.

Supported Cultures and Default Culture

Now, we need to configure how the application will select the appropriate culture for incoming requests.

In .NET, we need to specify which cultures our app supports. While .NET is capable of supporting multiple cultures out of the box, it still needs to know which specific cultures we are willing to support. By defining only the cultures we actually support, we can avoid unnecessary overhead and ensure that our app is optimised.

We have two separate things to manage when making an app available in different languages and regions in .NET:

SupportedCultures: This is about how the app displays numbers, dates, and currencies. For example, how a date is shown (like MM/dd/yyyy in the US);
SupportedUICultures: This is where we specify the languages our app supports for displaying text (the content inside the PO files).

To keep things consistent and handle both text translations and regional formatting properly, it is a good practice to configure both SupportedCultures and SupportedUICultures.

We also need to setup the DefaultRequestCulture. It is the fallback culture that our app uses when it does not have any explicit culture information from the request.

The following code shows how we configure all these. To make our demo simple, we assume the locale that user wants is passed via query string.

builder.Services.Configure<RequestLocalizationOptions>(options =>
{
    var supportedCultures = LocaleConstants.SupportedAppLocale
        .Select(cul => new CultureInfo(cul))
        .ToArray();

    options.DefaultRequestCulture = new RequestCulture(
        culture: "en", uiCulture: "en");
    options.SupportedCultures = supportedCultures;
    options.SupportedUICultures = supportedCultures;
    options.AddInitialRequestCultureProvider(
        new CustomRequestCultureProvider(async httpContext =>
        {
            var currentCulture = 
                CultureInfo.InvariantCulture.Name;
            var requestUrlPath = 
                httpContext.Request.Path.Value;

            if (httpContext.Request.Query.ContainsKey("locale"))
            {
                currentCulture =         
httpContext.Request.Query["locale"].ToString();
            }

            return await Task.FromResult(
                new ProviderCultureResult(currentCulture));
        })
    );
});

Next, we need to add the RequestLocalizationMiddleware in Program.cs to automatically set culture information for requests based on information provided by the client.

app.UseRequestLocalization();

After setting up the RequestLocalizationMiddleware, we can now move on to localising the API endpoint by using IStringLocalizer to retrieve translated text based on the culture information set for the current request.

About IStringLocalizer

IStringLocalizer is a service in ASP.NET Core used for retrieving localised resources, such as strings, based on the current culture of our app. In essence, IStringLocalizer acts as a bridge between our code and the language resources (like PO files) that contain translations. If the localised value of a key is not found, then the indexer key is returned.

We first need to inject IStringLocalizer into our API controllers or any services where we want to retrieve localised text.

app.MapGet("/weatherforecast", (IStringLocalizer<WeatherForecast> stringLocalizer) =>
{
    var forecast =  Enumerable.Range(1, 5).Select(index =>
        new WeatherForecast
        (
            DateOnly.FromDateTime(DateTime.Now.AddDays(index)),
            Random.Shared.Next(-20, 55),
            stringLocalizer["weather_" + summaries[Random.Shared.Next(summaries.Length)]]
        ))
        .ToArray();
    return forecast;
})
.WithName("GetWeatherForecast")
.WithOpenApi();

The reason we use IStringLocalizer<WeatherForecast> instead of just IStringLocalizer is because we are relying on Orchard Core package to handle the PO files. According to Sebastian Ros, the Orchard Core maintainer, we cannot resolve IStringLocalizer, we need IStringLocalizer<T>. When we use IStringLocalizer<T> instead of just IStringLocalizer is also related to how localisation is typically scoped in .NET applications.

Running on Localhost

Now, if we run the project using dotnet run, the Web API should compile successfully. Once the API is running on localhost, visiting the endpoint with zh-CN as the locale should return the weather summary in Chinese, as shown in the screenshot below.

The summary is getting the translated text from zh-CN.po now.

Dockerisation

Since the Web API is tested to be working, we can proceed to dockerise it.

We will first create a Dockerfile as shown below to define the environment our Web API will run in. Then we will build the Docker image, using the Dockerfile. After building the image, we will run it in a container, making our Web API available for use.

## Build Container
FROM mcr.microsoft.com/dotnet/sdk:8.0-alpine AS builder
WORKDIR /app

# Copy the project file and restore any dependencies (use .csproj for the project name)
COPY *.csproj ./
RUN dotnet restore

# Copy the rest of the application code
COPY . .

# Publish the application
RUN dotnet publish -c Release -o out

## Runtime Container
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime

ENV ASPNETCORE_URLS=http://*:80

WORKDIR /app
COPY --from=builder /app/out ./

# Expose the port your application will run on
EXPOSE 80

ENTRYPOINT ["dotnet", "Experiment.PO.dll"]

As shown in the Dockerfile, we are using .NET Alpine images. Alpine is a lightweight Linux distribution often used in Docker images because it is much smaller than other base images. It is a best practice when we want a minimal image with fewer security vulnerabilities and faster performance.

Globalisation Invariant Mode in .NET

When we run our Web API as a Docker container on our local machine, we will soon realise that our container has stopped because our Web API inside it crashed. It turns out that there is an exception called System.Globalization.CultureNotFoundException.

Our Web API crashes due to System.Globalization.CultureNotFoundException, as shown in `docker logs`.

As pointed out in the error message, only the invariant culture is supported in globalization-invariant mode.

The globalization-invariant mode was introduced in .NET 2.0 in 2017. It allows our apps to run without using the full globalization data, which can significantly reduce the runtime size and improve the performance of our application, especially in environments like Docker or microservices.

In globalization-invariant mode, only the invariant culture is used. This culture is based on English (United States) but it is not specifically tied to en-US. It is just a neutral culture used to ensure consistent behaviour across environments.

Before .NET 6, globalization-invariant mode allowed us to create any custom culture, as long as its name conformed to the BCP-47 standard. BCP-47 stands for Best Current Practice 47, and it defines a way to represent language tags that include the language, region, and other relevant cultural data. A BCP-47 language tag typically follows this pattern: language-region, for example zh-CN and zh-Hans.

Thus, before .NET 6, if an app creates a culture that is not the invariant culture, the operation succeeds.

However, starting from .NET 6, an exception is thrown if we create any culture other than the invariant culture in globalization-invariant mode. This explains why our app throws System.Globalization.CultureNotFoundException.

We thus need to disable the globalization-invariant mode in the .csproj file, as shown below, so that we can use the full globalization data, which will allow .NET to properly handle localisation.

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    ...
    <InvariantGlobalization>false</InvariantGlobalization>
  </PropertyGroup>

  ...

</Project>

Missing of ICU in Alpine

Since Alpine is a very minimal Linux distribution, it does not include many libraries, tools, or system components that are present in more standard distributions like Ubuntu.

In terms of globalisation, Alpine does not come pre-installed with ICU (International Components for Unicode), which .NET uses for localisation in our case.

Hence, after we turned off the globalization-invariant mode, we will encounter another issue, which is our Web API not being able to locate a valid ICU package.

Our Web API crashes due to the missing of ICU package, as shown in `docker logs`.

As suggested in the error message, we need to install the ICU libraries (icu-libs).

In .NET, icu-libs provides the necessary ICU libraries that allow our Web API to handle globalisation. However, the ICU libraries rely on culture-specific data to function correctly. This culture-specific data is provided by icu-data-full, which includes the full set of localisation and globalisation data for different languages and regions. Therefore, we need to install both icu-libs and icu-data-full, as shown below.

...

## Runtime Container
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime

# Install cultures
RUN apk add --no-cache \
   icu-data-full \
   icu-libs

...

After installing the ICU libraries, our weather forecast Web API container should be running successfully now. Now, when we visit the endpoint, we will realise that it is able to retrieve the correct value from the PO files, as shown in the following screenshot.

Yay, we can get the translated texts now!

One last thing I would like to share is that, as shown in the screenshot above, since we do not have a PO file for ms-BN (Malay for Brunei), the fallback mechanism automatically uses the ms.po file instead.

Additional Configuration

If you still could not get the translation with PO files to work, perhaps you can try out some of the suggestions from my teammates below.

Firstly, you may need to setup the AppLocalIcu in .csproj file. This setting is used to specify whether the app should use a local copy of ICU or rely on the system-installed ICU libraries. This is particularly useful in containerised environments like Docker.

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    ...
    <AppLocalIcu>true</AppLocalIcu>
  </PropertyGroup>

</Project>

Secondly, even though we have installed icu-libs and icu-data-full in our Alpine container, some .NET apps rely on data beyond just having the libraries available. In such case, we need to turn on the IncludeNativeLibrariesForSelfExtract setting as well in .csproj.

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    ...
    <IncludeNativeLibrariesForSelfExtract>true</IncludeNativeLibrariesForSelfExtract>
  </PropertyGroup>

</Project>

Thirdly, please check if you need to configure DOTNET_SYSTEM_GLOBALIZATION_PREDEFINED_CULTURES_ONLY as well. However, please take note that this setting only makes sense when when globalization-invariant mode is enabled.

Finally, you may also need to include the runtime ICU libraries with the Microsoft.ICU.ICU4C.Runtime NuGet package (Version 72.1.0.3), enabling your app to use culture-specific data for globalisation features.

References

From Zero to Gemini: Building an AI-Powered Game Helper

December 8, 2024January 19, 2025 by Chun Lin, posted in Cloud Computing: Google Cloud Platform, Event, Experience, Gemini, GenAI, Langtrace, Pinecone, Python

On a chilly November morning, I attended the Google DevFest 2024 in Singapore. Together with my friends, we attended a workshop titled “Gemini Masterclass: How to Unlock Its Power with Prompting, Functions, and Agents.” The session was led by two incredible speakers, Martin Andrews and Sam Witteveen.

Martin, who holds a PhD in Machine Learning and has been an Open Source advocate since 1999. Sam is a Google Developer Expert in Machine Learning. Both of them are also organisers of the Machine Learning Singapore Meetup group. Together, they delivered an engaging and hands-on workshop about Gemini, the advanced LLM from Google.

Thanks to their engaging Gemini Masterclass, I have taken my first steps into the world of LLMs. This blog post captures what I learned and my journey into the fascinating world of Gemini.

Martin Andrews presenting in Google DevFest 2024 in Singapore.

About LLM and Gemini

LLM stands for Large Language Model. To most people, an LLM is like a smart friend who can answer almost all our questions with responses that are often accurate and helpful.

As a LLM, Gemini is trained on large amount of text data and can perform a wide range of tasks: answering questions, writing stories, summarising long documents, or even helping to debug code. What makes them special is their ability to “understand” and generate language in a way that feels natural to us.

Many of my developer friends have started using Gemini as a coding assistant in their IDEs. While it is good at that, Gemini is much more than just a coding tool.

Gemini is designed to not only respond to prompts but also act as an assistant with an extra set of tools. To make the most of Gemini, it is important to understand how it works and what it can (and cannot) do. With the knowledge gained from the DevFest workshop, I decided to explore how Gemini could assist with optimising relic choices in a game called Honkai: Star Rail.

Honkai: Star Rail and Gemini for Its Relic Recommendations

An HSR streamer, MurderofBirds, browsing through thousands of relics. (Image Source: MurderofBirds Twitch)

This is where LLMs like Gemini come into play. With the ability to process and analyse complex data, Gemini can help players make smarter decisions.

In this blog post, I will briefly show how this Gemini-powered relic recommendation system can analyse a player’s current characters to suggest the best options for them. Then it will also explain the logic behind its recommendations, helping us to understand why certain relics are ideal.

Setup the Project

To make my project code available to everyone, I used Google Colab, a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources, including GPUs and TPUs. You can access my code by clicking on the button below.

In my project, I used the google-generativeai Python library, which is pre-installed in Colab. This library serves as a user-friendly API for interacting with Google LLMs, including Gemini. It makes it easy for us to integrate Gemini capabilities directly into our code.

Next, we will need to import the necessary libraries.

Importing the libraries and setup Gemini client.

The first library to import is definitely the google.generativeai. Without it, we cannot interact with Gemini easily. Then we have google.colab.userdata which securely retrieves sensitive data, like our API key, directly from the Colab notebook environment.

We will also use IPython.display for displaying results in a readable format, such as Markdown.

In the Secret section, we will have two records, i.e.

HONKAI_STAR_RAIL_PLAYER_ID: Your HSR player UID. It is used later to personalise relic recommendations.
GOOGLE_API_KEY: The API key that we can get from Google AI Studio to authenticate with Gemini.

Creating and retrieving our API keys in Google AI Studio.

Once we have initialised the google.generativeai library with the GOOGLE_API_KEY, we can proceed to specify the Gemini model we will be using.

The choice of model is crucial in LLM projects. Google AI Studio offers several options, each representing a trade-off between accuracy and cost. For my project, I choose models/gemini-1.5-flash-8b-001, which provided a good balance for this experiment. Larger models might offer slightly better accuracy but at a significant cost increase.

Google AI Studio offers a range of models, from smaller, faster models suitable for quick tasks to larger, more powerful models capable of more complex processing.

Hallucination and Knowledge Limitation

We often think of LLMs like Gemini as our smart friends who can answer any question. But just like even our smartest friend can sometimes make mistakes, LLMs have their limits too.

Gemini knowledge is based on the data it was trained on, which means it doesn’t actually know everything. Sometimes, it might hallucinate, i.e. model invents information that sounds plausible but not actually true.

Kiana is not a character from Honkai: Star Rail but she is from another game called Honkai Impact 3rd.

While Gemini is trained on a massive dataset, its knowledge is not unlimited. As a responsible AI, it acknowledges its limitations. So, when it cannot find the answer, it will tell us that it lacks the necessary information rather than fabricating a response. This is how Google builds safer AI systems, as part of its Secure AI Framework (SAIF).

To overcome these constraints, we need to employ strategies to augment the capabilities of LLMs. Techniques such as integrating Retrieval-Augmented Generation (RAG) and leveraging external APIs can help bridge the gap between what the model knows and what it needs to know to perform effectively.

System Instructions

Leveraging System Instructions is a way to improve the accuracy and reliability of Gemini responses.

System instructions are prompts given before the main query in order to guide Gemini. These instructions provide crucial context and constraints, significantly enhancing the accuracy and reliability of the generated output.

System Instruction with contextual information about HSR characters ensures Gemini has the necessary background knowledge.

The specific design and phrasing of the system instructions provided to the Gemini is crucial. Effective system instructions provide Gemini with the necessary context and constraints to generate accurate and relevant responses. Without carefully crafted system instructions, even the most well-designed prompt can yield poor results.

Context Framing

As we can see from the example above, writing clear and effective system instructions requires careful thought and a lot of testing.

This is just one part of a much bigger picture called Context Framing, which includes preparing data, creating embeddings, and deciding how the system retrieves and uses that data. Each of these steps needs expertise and planning to make sure the solution works well in real-world scenarios.

You might have heard the term “Prompt Engineering,” and it sounds kind of technical, but it is really about figuring out how to ask the LLM the right questions in the right way to get the best answers from an LLM.

While context framing and prompt engineering are closely related and often overlap, they emphasise different aspects of the interaction with the LLM.

Stochasticity

While experimenting with Gemini, I noticed that even if I use the exact same prompt, the output can vary slightly each time. This happens because LLMs like Gemini have a built-in element of randomness , known as Stochasticity.

Lingsha, an HSR character released in 2024. (Image Credit: Game8)

For example, when querying for DPS characters, Lingsha was inconsistently included in the results. While this might seem like a minor variation, it underscores the probabilistic nature of LLM outputs and suggests that running multiple queries might be needed to obtain a more reliable consensus.

Lingsha was inconsistently included in the response to the query about multi-target DPS characters.

According to the official announcement, even though Lingsha is a healer, she can cause significant damage to all enemies too. (Image Source: Honkai: Star Rail YouTube)

Hence, it is important to treat writing efficient system instruction and prompt as iterative processes. so that we can experiment with different phrasings to find what works best and yields the most consistent results.

Temperature Tuning

We can also reduce the stochasticity of Gemini response through adjusting parameters like temperature. Lower temperatures typically reduce randomness, leading to more consistent outputs, but also may reduce creativity and diversity.

Temperature is an important parameter for balancing predictability and diversity in the output. Temperature, a number in the range of 0.0 to 2.0 with default to be 1.0 in gemini-1.5-flash model, indicates the probability distribution over the vocabulary in the model when generating text. Hence, a lower temperature makes the model more likely to select words with higher probabilities, resulting in more predictable and focused text.

Having Temperature=0 means that the model will always select the most likely word at each step. The output will be highly deterministic and repetitive.

Function Calls

A major limitation of using system instructions alone is their static nature.

For example, my initial system instructions included a list of HSR characters, but this list is static. The list does not include newly released characters or characters specific to the player’s account. In order to dynamically access a player’s character database and provide personalised recommendations, I integrated Function Calls to retrieve real-time data.

For fetching the player’s HSR character data, I leveraged the open-source Python library mihomo. This library provides an interface for accessing game data, enabling dynamic retrieval of a player’s characters and their attributes. This dynamic data retrieval is crucial for generating truly personalised relic recommendations.

Using the mihomo library, I retrieve five of my Starfaring Companions.

Defining the functions in my Python code was only the first step. To use function calls, Gemini needed to know which functions were available. We can provide this information to Gemini as shown below.

model = genai.GenerativeModel('models/gemini-1.5-flash-8b-001', tools=[get_player_name, get_player_starfaring_companions])

After we pass a query to a Gemini, the model returns a structured object that includes the names of relevant functions and their arguments based on the prompt, as shown in the screenshot below.

The correct function call is picked up by Gemini based on the prompt.

Using descriptive function names is essential for successful function calling with LLMs because the accuracy of function calls depends heavily on well-designed function names in our Python code. Inaccurate naming can directly impact the reliability of the entire system.

If our Python function is named incorrectly, for example, calling a function get_age but it returns the name of the person, Gemini might select that function wrongly when the prompt is asking for age.

As shown in the screenshot above, the prompt requested information about all the characters of the player. Gemini simply determines which function to call and provides the necessary arguments. Gemini does not directly execute the functions. The actual execution of the function needs to be handled by us, as demonstrated in the screenshot below.

After Gemini telling us which function to call, our code needs to call the function to get the result.

Grounding with Google Search

Function calls are a powerful way to access external data, but they require pre-defined functions and APIs.

To go beyond these limits and gather information from many online sources, we can use Gemini grounding feature with Google Search. This feature allows Gemini to google and include what it finds in its answers. This makes it easier to get up-to-date information and handle questions that need real-time data.

If you are getting the HTTP 429 errors when using the Google Search feature, please make sure you have setup a billing account here with enough quota.

With this feature enabled, we thus can ask Gemini to get some real-time data from the Internet, as shown below.

The upcoming v2.7 patch of HSR is indeed scheduled to be released on 4th December.

Building a Semantic Knowledge Base with Pinecone

System instructions and Google search grounding provide valuable context, but a structured knowledge base is needed to handle the extensive data about HSR relics.

Having explored system instructions and Google search grounding, the next challenge is to manage the extensive data about HSR relics. We need a way to store and quickly retrieve this information, enabling the system to generate timely and accurate relic recommendations. Thus we will need to use a vector database ideally suited for managing the vast dataset of relic information.

Vector databases, unlike traditional databases that rely on keyword matching, store information as vectors enabling efficient similarity searches. This allows for retrieving relevant relic sets based on the semantic meaning of a query, rather than relying solely on keywords.

There are many options for vector database, but I choose Pinecone. Pinecone, a managed service, offered the scalability needed to handle the HSR relic dataset and the robust API essential for reliable data access. Its availability of a free tier is also a significant factor because it allows me to keep costs low during the development of my project.

Pinecone’s well-documented API and straightforward SDK make integration surprisingly easy. To get started, simply follow the Pinecone documentation to install the SDK in our code and retrieve the API key.

# Import the Pinecone library
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
import time

# Initialize a Pinecone client with your API key
pc = Pinecone(api_key=userdata.get('PINECONE_API_KEY'))

I prepare my Honkai: Star Rail relic data, which I have previously organised into a JSON structure. This data includes information on each relic set’s two-piece and four-piece effects. Here’s a snippet to illustrate the format:

[
  {
    "name": "Sacerdos' Relived Ordeal",
    "two_piece": "Increases SPD by 6%",
    "four_piece": "When using Skill or Ultimate on one ally target, increases the ability-using target's CRIT DMG by 18%, lasting for 2 turn(s). This effect can stack up to 2 time(s)."
  },
  {
    "name": "Scholar Lost in Erudition",
    "two_piece": "Increases CRIT Rate by 8%",
    "four_piece": "Increases DMG dealt by Ultimate and Skill by 20%. After using Ultimate, additionally increases the DMG dealt by the next Skill by 25%."
  },
  ...
]

With the relic data organised in Pinecone, the next challenge is to enable similarity searches with vector embedding. Vector embedding captures the semantic meaning of the text, allowing Pinecone to identify similar relic sets based on their inherent properties and characteristics.

Vector embedding representations (Image Credit: Pinecode)

Now, we can generate vector embeddings for the HSR relic data using Pinecone. The following code snippet illustrates this process which is to convert textual descriptions of relic sets into numerical vector embeddings. These embeddings capture the semantic meaning of the relic set descriptions, enabling efficient similarity searches later.

# Load relic set data from the JSON file
with open('/content/hsr-relics.json', 'r') as f:
    relic_data = json.load(f)

# Prepare data for Pinecone
relic_info_data = [
    {"id": relic['name'], "text": relic['two_piece'] + " " + relic['four_piece']}  # Combine relic set descriptions
    for relic in relic_data
]

# Generate embeddings using Pinecone
embeddings = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[d['text'] for d in relic_info_data],
    parameters={"input_type": "passage", "truncate": "END"}
)

print(embeddings)

As shown in the code above, we use the multilingual-e5-large model, a text embedding model from Microsoft research, to generate a vector embedding for each relic set. The multilingual-e5-large model works well on messy data and it is good for short queries.

Pinecone ability to perform fast similarity searches relies on its indexing mechanism. Without an index, searching for similar relic sets would require comparing each relic set’s embedding vector to every other one, which would be extremely slow, especially with a large dataset. I choose Pinecone serverless index hosted on AWS for its automatic scaling and reduced infrastructure management.

# Create a serverless index
index_name = "hsr-relics-index"

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(
            cloud='aws', 
            region='us-east-1'
        ) 
    ) 

# Wait for the index to be ready
while not pc.describe_index(index_name).status['ready']:
    time.sleep(1)

The dimension parameter specifies the dimensionality of the vector embeddings. Higher dimensionality generally allows for capturing more nuanced relationships between data points. For example, two relic sets might both increase ATK, but one might also increase SPD while the other increases Crit DMG. A higher-dimensional embedding allows the system to capture these subtle distinctions, leading to more relevant recommendations.

For the metric parameter which measures the similarity between two vectors (representing relic sets), we use the cosine metric which is suitable for measuring the similarity between vector embeddings generated from text. This is crucial for understanding how similar two relic descriptions are.

With the vector embeddings generated, the next step was to upload them into my Pinecone index. Pinecone uses the upsert function to add or update vectors in the index. The following code snippet shows how we can upsert the generated embeddings into the Pinecone index.

# Target the index where you'll store the vector embeddings
index = pc.Index("hsr-relics-index")

# Prepare the records for upsert
# Each contains an 'id', the embedding 'values', and the original text as 'metadata'
records = []
for r, e in zip(relic_info_data, embeddings):
    records.append({
        "id": r['id'],
        "values": e['values'],
        "metadata": {'text': r['text']}
    })

# Upsert the records into the index
index.upsert(
    vectors=records,
    namespace="hsr-relics-namespace"
)

The code uses the zip function to iterate through both the list of prepared relic data and the list of generated embeddings simultaneously. For each pair, it creates a record for Pinecone with the following attributes.

id: Name of the relic set to ensure uniqueness;
values: The vector representing the semantic meaning of the relic set effects;
metadata: The original description of the relic effects, which will be used later for providing context to the user’s recommendations.

Implementing Similarity Search in Pinecone

With the relic data stored in Pinecone now, we can proceed to implement the similarity search functionality.

def query_pinecone(query: str) -> dict:

  # Convert the query into a numerical vector that Pinecone can search with
  query_embedding = pc.inference.embed(
      model="multilingual-e5-large",
      inputs=[query],
      parameters={
          "input_type": "query"
      }
  )

  # Search the index for the three most similar vectors
  results = index.query(
      namespace="hsr-relics-namespace",
      vector=query_embedding[0].values,
      top_k=3,
      include_values=False,
      include_metadata=True
  )

  return results

The function above takes a user’s query as input, converts it into a vector embedding using Pinecone’s inference endpoint, and then uses that embedding to search the index, returning the top three most similar relic sets along with their metadata.

Relic Recommendations with Pinecone and Gemini

With the integration with Pinecode, we design the initial prompt to pick relevant relic sets from Pinecone. After that, we take the results from Pinecone and combine them with the initial prompt to create a richer, more informative prompt for Gemini, as shown in the following code.

from google.generativeai.generative_models import GenerativeModel

async def format_pinecone_results_for_prompt(model: GenerativeModel, player_id: int) -> dict:
  character_relics_mapping = await get_player_character_relic_mapping(player_id)

  result = {}

  for character_name, (character_avatar_image_url, character_description) in character_relics_mapping.items():
    print(f"Processing Character: {character_name}")

    additional_character_data = character_profile.get(character_name, "")

    character_query = f"Suggest some good relic sets for this character: {character_description} {additional_character_data}"

    pinecone_response = query_pinecone(character_query)

    prompt = f"User Query: {character_query}\n\nRelevant Relic Sets:\n"
    for match in pinecone_response['matches']:
        prompt += f"* {match['id']}: {match['metadata']['text']}\n"  # Extract relevant data
    prompt += "\nBased on the above information, recommend two best relic sets and explain your reasoning. Each character can only equip with either one 4-piece relic or one 2-piece relic with another 2-piece relic. You cannot recommend a combination of 4-piece and 2-piece together. Consider the user's query and the characteristics of each relic set."

    response = model.generate_content(prompt)

    result[character_avatar_image_url] = response.text

  return result

The code shows that we are doing both prompt engineering (designing the initial query to get relevant relics) and context framing (combining the initial query with the retrieved relic information to get a better overall recommendation from Gemini).

First the code retrieves data about the player’s characters, including their descriptions, images, and relics the characters currently are wearing. The code then gathers potentially relevant data about each character from a separate data source character_profile which has more information, such as gameplay mechanic about the characters that we got from the Game8 Character List. With the character data, the query will find similar relic sets in the Pinecone database.

After Pinecone returns matches, the code constructs a detailed prompt for the Gemini model. This prompt includes the character’s description, relevant relic sets found by Pinecone, and crucial instructions for the model. The instructions emphasise the constraints of choosing relic sets: either a 4-piece set, or two 2-piece sets, not a mix. Importantly, it also tells Gemini to consider the character’s existing profile and to prioritise fitting relic sets.

Finally, the code sends this detailed prompt to Gemini, receiving back the recommended relic sets.

Knight of Purity Palace, is indeed a great option for Gepard!

Enviosity, a popular YouTuber known for his in-depth Honkai: Star Rail strategy guides, introduced Knight of Purity Palace for Gepard too. (Source: Enviosity YouTube)

Langtrace

Using LLMs like Gemini is sure exciting, but figuring out what is happening “under the hood” can be tricky.

If you are a web developer, you are probably familiar with Grafana dashboards. They show you how your web app is performing, highlighting areas that need improvement.

Langtrace is like Grafana, but specifically for LLMs. It gives us a similar visual overview, tracking our LLM calls, showing us where they are slow or failing, and helping us optimise the performance of our AI app.

Traces for the Gemini calls are displayed individually.

Langtrace is not only useful for tracing our LLM calls, it also offers metrics on token counts and costs, as shown in the following screenshot.

Beyond tracing calls, Langtrace collects metrics too.

Wrap-Up

Building this Honkai: Star Rail (HSR) relic recommendation system is a rewarding journey into the world of Gemini and LLMs.

I am incredibly grateful to Martin Andrews and Sam Witteveen for their inspiring Gemini Masterclass at Google DevFest in Singapore. Their guidance helped me navigate the complexities of LLM development, and I learned firsthand the importance of careful prompt engineering, the power of system instructions, and the need for dynamic data access through function calls. These lessons underscore the complexities of developing robust LLM apps and will undoubtedly inform my future AI projects.

Building this project is an enjoyable journey of learning and discovery. I encountered many challenges along the way, but overcoming them deepened my understanding of Gemini. If you’re interested in exploring the code and learning from my experiences, you can access my Colab notebook through the button below. I welcome any feedback you might have!