ai – cuteprogramming

Beyond the Cert: In the Age of AI

October 26, 2025October 26, 2025 by Chun Lin, posted in Cloud Computing: Microsoft Azure, Experience, Microsoft Certified

For the fourth consecutive year, I have renewed my Azure Developer Associate certification. It is a valuable discipline that keeps my knowledge of the Azure ecosystem current and sharp. The performance report I received this year was particularly insightful, highlighting both my strengths in security fundamentals and the expected gaps in platform-specific nuances, given my recent work in AWS.

Objectives

Renewing Azure certification is a hallmark of a professional craftsman because it sharpens our tools, knowing our trade. For a junior or mid-level engineer, this path of structured learning and certification is the non-negotiable foundation of a solid career. It is the path I walked myself. It builds the grammar of our trade.

However, for a senior engineer, for an architect, the game has changed. The world is now saturated with competent craftsmen who know the grammar. In the age of AI-assisted coding and brutal corporate “flattening,” simply knowing the tools is no longer a defensible position. It has become table stakes.

The paradox of the senior cloud software engineer is that the very map that got us here, i.e. the structured curriculum and the certification path, is insufficient to guide us to the next level. The renewal assessment results for Microsoft Certified: Azure Developer Associate I received was a perfect map of the existing territory. However, an architect’s job is not to be a master of the known world. It is to be a cartographer of the unknown. The report correctly identified that I need to master Azure specific trade-offs, like choosing ‘Session’ consistency over ‘Strong’ for low-latency scenarios in CosmosDB. The senior engineer learns that rule. The architect must ask a deeper question: “How can I build a model that predicts the precise cost and P99 latency impact of that trade-off for my specific workload, before I write a single line of code?”

Attending AWS Singapore User Group monthly meetup.

About the Results

Let’s make this concrete by looking at the renewal assessment report itself. It was a gift, not because of the score, but because it is a perfect case study in the difference between the Senior Engineer’s path and the Architect’s.

Where the report suggests mastering Azure Cosmos DB five consistency levels, it is prescribing an act of knowledge consumption. The architect’s impulse is to ask a different question entirely: “How can I quantify the trade-off?” I do not just want to know that Session is faster than Strong. I should know, for a given workload, how much faster, at what dollar cost per million requests, and with what measurable impact on data integrity. The architect’s response is to build a model to turn the vendor’s qualitative best practice into a quantitative, predictive economic decision.

This pattern continues with managed services. The report correctly noted my failure to memorise the specific implementation of Azure Container Apps. The path it offers is to better learn the abstraction. The architect’s path is to become professionally paranoid about abstractions. The question is not “What is Container Apps?” but “Why does this abstraction exist, and what are its hidden costs and failure modes?” The architect’s response is to design experiments or simulations to stress-test the abstraction and discover its true operational boundaries, not just to read its documentation.

DHH has just slain the dragon of Cloud Dependency, the largest, most fearsome dragon in our entire cloud industry. (Twitter Source: DHH)

This is the new mandate for senior engineers in this new world where we keep on listening senior engineers being out of work: We must evolve from being consumers of complexity to being creators of clarity. We must move beyond mastering the vendor’s pre-defined solutions and begin forging our own instruments to see the future.

From Cert to Personal Project

This is why, in parallel to maintaining my certifications, I have embarked on a different kind of professional development. It is a path of deep, first-principles creation. I am building a discrete event simulation engine not as a personal hobby project, but as a way to understand more about the most expensive and unpredictable problems in our industry. My certification proves I can solve problems the “Azure way.” This new work is about discovering the the fundamental truths that govern all cloud platforms.

Certifications are the foundation. They are the bedrock of our shared knowledge. However, they are not the lighthouse. In this new era, we must be both.

Certifications are an essential foundation. They represent the bedrock of our shared professional knowledge and a commitment to maintaining a common standard of excellence. However they are not, by themselves, the final destination.

Therefore, my next major “proof-of-work” will not be another certificate. It will be the first in a series of public, data-driven case studies derived from my personal project.

Ultimately, a certificate proves that we are qualified and contributing members of our professional ecosystem. This next body of work is intended to prove something more than that. We need to actively solve the complex, high-impact problems that challenge our industry. In this new era, demonstrating both our foundational knowledge and our capacity to create new value is no longer an aspiration. Instead, it is the new standard.

Together, we learn better.

The Blueprint Fallacy: A Case for Discrete Event Simulation in Modern Systems Architecture

October 18, 2025October 18, 2025 by Chun Lin, posted in Artificial Intelligence, C#, Data, Discrete Event Simulation, Event, Experience, Kubernetes, Observability

Greetings from Taipei!

I just spent two days at the Hello World Dev Conference 2025 in Taipei, and beneath the hype around cloud and AI, I observed a single, unifying theme: The industry is desperately building tools to cope with a complexity crisis of its own making.

The agenda was a catalog of modern systems engineering challenges. The most valuable sessions were the “踩雷經驗” (landmine-stepping experiences), which offered hard-won lessons from the front lines.

A 2-day technical conference on AI, Kubernetes, and more!

However, these talks raised a more fundamental question for me. We are getting exceptionally good at building tools to detect and recover from failure but are we getting any better at preventing it?

This post is not a simple translation of a Mandarin-language Taiwan conference. It is my analysis of the patterns I observed. I have grouped the key talks I attended into three areas:

Cloud Native Infrastructure;
Reshaping Product Management and Engineering Productivity with AI;
Deep Dives into Advanced AI Engineering.

Feel free to choose to dive into the section that interests you most.

Session: Smart Pizza and Data Observability

This session was led by Shuhsi (林樹熙), a Data Engineering Manager at Micron. Micron needs no introduction, they are a massive player in the semiconductor industry, and their smart manufacturing facilities are a prime example of where data engineering is mission-critical.

Shuhsi’s talk, “Data Observability by OpenLineage,” started with a simple story he called the “Smart Pizza” anomaly.

He presented a scenario familiar to anyone in a data-intensive environment: A critical dashboard flatlines, and the next three hours are a chaotic hunt to find out why. In his “Smart Pizza” example, the culprit was a silent, upstream schema change.

His solution, OpenLineage, is a powerful framework for what we would call digital forensics. It is about building a perfect, queryable map of the crime scene after the crime has been committed. By creating a clear data lineage, it reduces the “Mean Time to Discovery” from hours of panic to minutes of analysis.

Let’s be clear: This is critical, valuable work. Like OpenTelemetry for applications, OpenLineage brings desperately needed order to the chaos of modern data pipelines.

It is a fundamentally reactive posture. It helps us find the bullet path through the body with incredible speed and precision. However, my main point is that our ultimate goal must be to predict the bullet trajectory before the trigger is pulled. Data lineage minimises downtime. My work with simulation, which will be explained in the next session, aims to prevent it entirely by modelling these complex systems to find the breaking points before they break.

Session: Automating a .NET Discrete Event Simulation on Kubernetes

My talk, “Simulation Lab on Kubernetes: Automating .NET Parameter Sweeps,” addressed the wall that every complex systems analysis eventually hits: Combinatorial explosion.

While the industry is focused on understanding past failures, my session is about building the Discrete Event Simulation (DES) engine that can calculate and prevent future ones.

A restaurant simulation game in Honkai Impact 3rd. (Source:
西琳 – YouTube)

To make this concrete, I used the analogy of a restaurant owner asking, “Should I add another table or hire another waiter?” The only way to answer this rigorously is to simulate thousands of possible futures. The math becomes brutal, fast: testing 50 different configurations with 100 statistical runs each requires 5,000 independent simulations. This is not a task for a single machine; it requires a computational army.

My solution is to treat Kubernetes not as a service host, but as a temporary, on-demand supercomputer. The strategy I presented had three core pillars:

Declarative Orchestration: The entire 5,000-run DES experiment is defined in a single, clean Argo Workflows manifest, transforming a potential scripting nightmare into a manageable, observable process.
Radical Isolation: Each DES run is containerised in its own pod, creating a perfectly clean and reproducible experimental environment.
Controlled Randomness: A robust seeding strategy is implemented to ensure that “random” events in our DES are statistically valid and comparable across the entire distributed system.

*The turnout for my DES session confirmed a growing hunger in our industry for proactive, simulation-driven approaches to engineering.*

The final takeaway was a strategic re-framing of a tool many of us already use. Kubernetes is more than a platform for web apps. It can also be a general-purpose compute engine capable of solving massive scientific and financial modelling problems. It is time we started using it as such.

Session: AI for BI

Denny’s (監舜儀) session on “AI for BI” illustrated a classic pain point: The bottleneck between business users who need data and the IT teams who provide it. The proposed solution was a natural language interface, the FineChatBI, a tool designed to sit on top of existing BI platforms to make querying existing data easier.

His core insight was that the tool is the easy part. The real work is in building the “underground root system” which includes the immense challenge of defining metrics, managing permissions, and untangling data semantics. Without this foundation, any AI is doomed to fail.

Getting the underground root system right is important for building AI projects.

This is a crucial step forward in making our organisations more data-driven. However, we must also be clear about what problem is being solved.

This is a system designed to provide perfect, instantaneous answers to the question, “What happened?”

My work, and the next category of even more complex AI, begins where this leaves off. It seeks to answer the far harder question: “What will happen if…?” Sharpening our view of the past is essential, but the ultimate strategic advantage lies in the ability to accurately simulate the future.

Session: The Impossibility of Modeling Human Productivity

The presented Jugg (劉兆恭) is a well-known agile coach and the organiser of Agile Tour Taiwan 2020. His talk, “An AI-Driven Journey of Agile Product Development – From Inspiration to Delivery,” was a masterclass in moving beyond vanity metrics to understand and truly improve engineering performance.

Jugg started with a graph that every engineering lead knows in their gut. As a company grows over time:

Business grow (purple line, up);
Software architecture and complexity grow (first blue line, up);
The number of developers increases (second blue line, up);
Expected R&D productivity should grow (green line, up);
But paradoxically, the actual R&D productivity often stagnates or even declines (red line, down).

Jugg provided a perfect analogue for the work I do. He tackled the classic productivity paradox: Why does output stagnate even as teams grow? He correctly diagnosed the problem as a failure of measurement and proposed the SPACE framework as a more holistic model for this incredibly complex human system.

He was, in essence, trying to answer the same class of question I do: “If we change an input variable (team process), how can we predict the output (productivity)?”

This is where the analogy becomes a powerful contrast. Jugg’s world of human systems is filled with messy, unpredictable variables. His solutions are frameworks and dashboards. They are the best tools we have for a system that resists precise calculation.

This session reinforced my conviction that simulation is the most powerful tool we have for predicting performance in the systems we can actually control: Our code and our infrastructure. We do not have to settle for dashboards that show us the past because we can build models that calculate the future.

Session: Building a Map of “What Is” with GraphRAG

The most technically demanding session came from Nils (劉岦崱), a Senior Data Scientist at Cathay Financial Holdings. He presented GraphRAG, a significant evolution beyond the “Naive RAG” most of us use today.

He argued compellingly that simple vector search fails because it ignores relationships. By chunking documents, we destroy the contextual links between concepts. GraphRAG solves this by transforming unstructured data into a structured knowledge graph: a web of nodes (entities) and edges (their relationships).

Enhancing RAG-based application accuracy by constructing and leveraging knowledge graphs (Image Credit: LangChain)

In essence, GraphRAG is a sophisticated tool for building a static map of a known world. It answers the question, “How are all the pieces in our universe connected right now?” For AI customer service, this is a game-changer, as it provides a rich, interconnected context for every query.

This means our data now has an explicit, queryable structure. So, the LLM gets a much richer, more coherent picture of the situation, allowing it to maintain context over long conversations and answer complex, multi-faceted questions.

This session was a brilliant reminder that all advanced AI is built on a foundation of rigorous data modelling.

However, a map, no matter how detailed, is still just a snapshot. It shows us the layout of the city, but it cannot tell us how the traffic will flow at 5 PM.

This is the critical distinction. GraphRAG creates a model of a system at rest and DES creates a model of a system in motion. One shows us the relationships while the other lets us press watch how those relationships evolve and interact over time under stress. GraphRAG is the anatomy chart and simulation is the stress test.

Session: Securing the AI Magic Pocket with LLM Guardrails

Nils from Cathay Financial Holdings returned to the stage for Day 2, and this time he tackled one of the most pressing issues in enterprise AI: Security. His talk “Enterprise-Grade LLM Guardrails and Prompt Hardening” was a masterclass in defensive design for AI systems.

What made the session truly brilliant was his central analogy. As he put it, an LLM is a lot like Doraemon: a super-intelligent, incredibly powerful assistant with a “magic pocket” of capabilities. It can solve almost any problem you give it. But, just like in the cartoon, if you give it vague, malicious, or poorly thought-out instructions, it can cause absolute chaos. For a bank, preventing that chaos is non-negotiable.

Nils grounded the problem in the official OWASP Top 10 for LLM Applications.

There are two lines of defence: Guardrails and Prompt Hardening. The core of the strategy lies in understanding two distinct but complementary approaches:

Guardrails (The Fortress): An external firewall of input filters and output validators;
Prompt Hardening (The Armour): Internal defences built into the prompt to resist manipulation.

This is an essential framework for any enterprise deploying LLMs. It represents the state-of-the-art in building static defences.

While necessary, this defensive posture raises another important question for a developers: How does the fortress behave under a full-scale siege?

A static set of rules can defend against known attack patterns. But what about the unknown unknowns? What about the second-order effects? Specifically:

Performance Under Attack: What is the latency cost of these five layers of validation when we are hit with 10,000 malicious requests per second? At what point does the defence itself become a denial-of-service vector?
Emergent Failures: When the system is under load and memory is constrained, does one of these guardrails fail in an unexpected way that creates a new vulnerability?

These are not questions a security checklist can answer. They can only be answered by a dynamic stress test. The X-Teaming Nils mentioned is a step in this direction, but a full-scale DES is the ultimate laboratory.

Neil’s techniques are a static set of rules designed to prevent failure. Simulation is a dynamic engine designed to induce failure in a controlled environment to understand a system true breaking points. He is building the armour while my work with DES is in building the testing grounds to see where that armour will break.

Session: Driving Multi-Task AI with a Flowchart in a Single Prompt

The final and most thought-provoking session was delivered by 尹相志, who presented a brilliant hack: Embedding a Mermaid flowchart directly into a prompt to force an LLM to execute a deterministic, multi-step process.

He provided a new way beyond the chaos of autonomous agents and the rigidity of external orchestrators like LangGraph. By teaching the LLM to read a flowchart, he effectively turns it into a reliable state machine executor. It is a masterful piece of engineering that imposes order on a probabilistic system.

Action Grounding Principles proposed by 相志.

What he has created is the perfect blueprint. It is a model of a process as it should run in a world with no friction, no delays, and no resource contention.

And in that, he revealed the final, critical gap in our industry thinking.

A blueprint is not a stress test. A flowchart cannot answer the questions that actually determine the success or failure of a system at scale:

What happens when 10,000 users try to execute this flowchart at once and they all hit the same database lock?
What is the cascading delay if one step in the flowchart has a 5% chance of timing out?
Where are the hidden queues and bottlenecks in this process?

His flowchart is the architect’s beautiful drawing of an airplane. A DES is the wind tunnel. It is the necessary, brutal encounter with reality that shows us where the blueprint will fail under stress.

The ability to define a process is the beginning. The ability to simulate that process under the chaotic conditions of the real world is the final, necessary step to building systems that don’t just look good on paper, but actually work.

Final Thoughts and Key Takeaways from Taipei

My two days at the Hello World Dev Conference were not a tour of technologies. In fact, they were a confirmation of a dangerous blind spot in our industry.

From what I observe, they build tools for digital forensics to map past failures. They sharpen their tools with AI to perfectly understand what just happened. They create knowledge graphs to model the systems at rest. They design perfect, deterministic blueprints for how AI processes should work.

These are all necessary and brilliant advancements in the art of mapmaking.

However, the critical, missing discipline is the one that asks not “What is the map?”, but “What will happen to the city during the hurricane?” The hard questions of latency under load, failures, and bottlenecks are not found on any of their map.

Our industry is full of brilliant mapmakers. The next frontier belongs to people who can model, simulate, and predict the behaviour of complex systems under stress, before the hurricane reaches.

That is why I am building SNA, my .NET-based Discrete Event Simulation engine.

Hello, Taipei. Taken from the window of the conference venue.

I am leaving Taipei with a notebook full of ideas, a deeper understanding of the challenges and solutions being pioneered by my peers in the Mandarin-speaking tech community, and a renewed sense of excitement for the future we are all building.

When Pinecone Wasn’t Enough: My Journey to pgvector

January 20, 2025January 20, 2025 by Chun Lin, posted in Cloud Computing: Amazon Web Services (AWS), Experience, Pinecone, PostgreSQL, Python, SQL

If you work with machine learning or natural language processing, you have probably dealt with storing and searching through vector embeddings.

When I created the Honkai: Star Rail (HSR) relic recommendation system using Gemini, I started with Pinecone. Pinecone is a managed vector database that made it easy to index relic descriptions and character data as embeddings. It helped me find the best recommendations based on how similar they were.

Pinecone worked well, but as the project grew, I wanted more control, something open-source, and a cheaper option. That is when I found pgvector, a tool that adds vector search to PostgreSQL and gives the flexibility of an open-source database.

About HSR and Relic Recommendation System

Honkai: Star Rail (HSR) is a popular RPG that has captured the attention of players worldwide. One of the key features of the game is its relic system, where players equip their characters with relics like hats, gloves, or boots to boost stats and unlock special abilities. Each relic has unique attributes, and selecting the right sets of relics for a character can make a huge difference in gameplay.

An HSR streamer, Unreal Dreamer, learning the new relic feature. (Image Source: Unreal Dreamer YouTube)

As a casual player, I often found myself overwhelmed by the number of options and the subtle synergies between different relic sets. Finding the good relic combination for each character was time-consuming.

This is where LLMs like Gemini come into play. With the ability to process and analyse complex data, Gemini can help players make smarter decisions.

In November 2024, I started a project to develop a Gemini-powered HSR relic recommendation system which can analyse a player’s current characters to suggest the best options for them. In the project, I have been storing embeddings in Pinecone.

Embeddings and Vector Database

An embedding is a way to turn data, like text or images, into a list of numbers called a vector. These vectors make it easier for a computer to compare and understand the relationships between different pieces of data.

For example, in the HSR relic recommendation system, we use embeddings to represent descriptions of relic sets. The numbers in the vector capture the meaning behind the words, so similar relics and characters have embeddings that are closer together in a mathematical sense.

This is where vector databases like Pinecone or pgvector come in. Vector databases are designed for performing fast similarity searches on large collections of embeddings. This is essential for building systems that need to recommend, match, or classify data.

pgvector is an open-source extension for PostgreSQL that allows us to store and search for vectors directly in our database. It adds specialised functionality for handling vector data, like embeddings in our HSR project, making it easier to perform similarity searches without needing a separate system.

Unlike managed services like Pinecone, pgvector is open source. This meant we could use it freely and avoid vendor lock-in. This is a huge advantage for developers.

Finally, since pgvector runs on PostgreSQL, there is no need for additional managed service fees. This makes it a budget-friendly option, especially for projects that need to scale without breaking the bank.

Choosing the Right Model

While the choice of the vector database is important, it is not the key factor in achieving great results. The quality of our embeddings actually is determined by the model we choose.

For my HSR relic recommendation system, when our embeddings were stored in Pinecone, I started by using the multilingual-e5-large model from Microsoft Research offered in Pinecone.

When I migrated to pgvector, I had the freedom to explore other options. For this migration, I chose the all-MiniLM-L6-v2 model hosted on Hugging Face, which is a lightweight sentence-transformer designed for semantic similarity tasks. Switching to this model allowed me to quickly generate embeddings for relic sets and integrate them into pgvector, giving me a solid starting point while leaving room for future experimentation.

Using all-MiniLM-L6-v2 Model

Once we have decided to use the all-MiniLM-L6-v2 model, the next step is to generate vector embeddings for the relic descriptions. This model is from the sentence-transformers library, so we first need to install the library.

pip install sentence-transformers

The library offers SentenceTransformer class to load pre-trained models.

from sentence_transformers import SentenceTransformer

model_name = 'all-MiniLM-L6-v2'
model = SentenceTransformer(model_name)

At this point, the model is ready to encode text into embeddings.

The SentenceTransformer model takes care of tokenisation and other preprocessing steps internally, so we can directly pass text to it.

# Function to generate embedding for a single text
def generate_embedding(text):
    # No need to tokenise separately, it's done internally
    # No need to average the token embeddings
    embeddings = model.encode(text) 

    return embeddings

In this function, when we call model.encode(text), the model processes the text through its transformer layers, generating an embedding that captures its semantic meaning. The output is already optimised for tasks like similarity search.

Setting up the Database

After generating embeddings for each relic sets using the all-MiniLM-L6-v2 model, the next step is to store them in the PostgreSQL database with the pgvector extension.

For developers using AWS, there is a good news. In May 2023, AWS announced that Amazon Relational Database Service (RDS) for PostgreSQL would be supporting pgvector. In November 2024, Amazon RDS started to support pgvector 0.8.0.

pgvector is now supported on Amazon RDS for PostgreSQL.

To install the extension, we will run the following command in our database. This will introduce a new datatype called VECTOR.

CREATE EXTENSION vector;

After this, we can define our table as follows.

CREATE TABLE IF NOT EXISTS embeddings (
    id TEXT PRIMARY KEY,
    vector VECTOR(384),
    text TEXT
);

Besides the id column which is for the unique identifier, there are two other columns that are important.

The text column stores the original text for each relic (the two-piece and four-piece bonus descriptions).

The vector column stores the embeddings. The VECTOR(384) type is used to store embeddings, and 384 here refers to the number of dimensions in the vector. In our case, the embeddings generated by the all-MiniLM-L6-v2 model are 384-dimensional, meaning each embedding will have 384 numbers.

Here, a dimension refers to one of the “features” that helps describe something. When we talk about vectors and embeddings, each dimension is just one of the many characteristics used to represent a piece of text. These features could be things like the type of words used, their relationships, and even the overall meaning of the text.

Updating the Database

After the table is created, we can proceed to create INSERT INTO SQL statements to insert the embeddings and their associated text into the database.

In this step, I load the relic information from a JSON file and process it.

import json

# Load your relic set data from a JSON file
with open('/content/hsr-relics.json', 'r') as f:
    relic_data = json.load(f)

# Prepare data
relic_info_data = [
    {"id": relic['name'], "text": relic['two_piece'] + " " + relic['four_piece']}  # Combine descriptions
    for relic in relic_data
]

The relic_info_data will then be passed to the following function to generate the INSERT INTO statements.

# Function to generate INSERT INTO statements with vectors
def generate_insert_statements(data):
    # Initialise list to store SQL statements
    insert_statements = []

    for record in data:
        # Extracting text and id from the record
        id = record.get('id')
        text = record.get('text')

        # Generate the embedding for the text
        embedding = generate_embedding(text)
        
        # Convert the embedding to a list
        embedding_list = embedding.tolist()
        
        # Create the SQL INSERT INTO statement
        sql_statement = f"""
        INSERT INTO embeddings (id, vector, text)
        VALUES (
          '{id.replace("'", "''")}', 
          ARRAY{embedding_list}, 
          '{text.replace("'", "''")}')
        ON CONFLICT (id) DO UPDATE
        SET vector = EXCLUDED.vector, text = EXCLUDED.text;
        """
        
        # Append the statement to the list
        insert_statements.append(sql_statement)

    return insert_statements

The embeddings of the relic sets are successfully inserted to the database.

How It All Fits Together: Query the Database

Once we have stored the vector embeddings of all the relic sets in our PostgreSQL database, the next step is to find the relic sets that are most similar to a given character’s relic needs.

Just like what we have done for storing relic set embeddings, we need to generate an embedding for the query describing the character’s relic needs. This is done by passing the query through the model as demonstrated in the following code.

def query_similar_embeddings(query_text):
    query_embedding = generate_embedding(query_text)

    return query_embedding.tolist()

The generated embedding is an array of 384 numbers. We simply use this array in our SQL query below.

SELECT id, text, vector <=> '[<embedding here>]' AS distance
FROM embeddings
ORDER BY distance
LIMIT 3;

The key part of the query is the <=> operator. This operator calculates the “distance” between two vectors based on cosine similarity. In our case, it measures how similar the query embedding is to each stored embedding. The smaller the distance, the more similar the embeddings are.

We use LIMIT 3 to get the top 3 most similar relic sets.

Test Case: Finding Relic Sets for Gallagher

Gallagher is a Fire and Abundance character in HSR. He is a sustain unit that can heal allies by inflicting a debuff on the enemy.

According to the official announcement, Gallagher is a healer. (Image Source: Honkai: Star Rail YouTube)

The following screenshot shows the top 3 relic sets which are closely related to a HSR character called Gallagher using the query “Suggest the best relic sets for this character: Gallagher is a Fire and Abundance character in Honkai: Star Rail. He can heal allies.”

The returned top 3 relic sets are indeed recommended for Gallagher.

One of the returned relic sets is called the “Thief of Shooting Meteor”. It is the official recommended relic set in-game, as shown in the screenshot below.

Gallagher’s official recommended relic set.

Future Work

In our project, we will not be implementing indexing because currently in HSR, there are only a small number of relic sets. Without an index, PostgreSQL will still perform vector similarity searches efficiently because the dataset is small enough that searching through it directly will not take much time. For small-scale apps like ours, querying the vector data directly is both simple and fast.

However, when our dataset grows larger in the future, it is a good idea to explore indexing options, such as the ivfflat index, to speed up similarity searches.

References

From Zero to Gemini: Building an AI-Powered Game Helper

December 8, 2024January 19, 2025 by Chun Lin, posted in Cloud Computing: Google Cloud Platform, Event, Experience, Gemini, GenAI, Langtrace, Pinecone, Python

On a chilly November morning, I attended the Google DevFest 2024 in Singapore. Together with my friends, we attended a workshop titled “Gemini Masterclass: How to Unlock Its Power with Prompting, Functions, and Agents.” The session was led by two incredible speakers, Martin Andrews and Sam Witteveen.

Martin, who holds a PhD in Machine Learning and has been an Open Source advocate since 1999. Sam is a Google Developer Expert in Machine Learning. Both of them are also organisers of the Machine Learning Singapore Meetup group. Together, they delivered an engaging and hands-on workshop about Gemini, the advanced LLM from Google.

Thanks to their engaging Gemini Masterclass, I have taken my first steps into the world of LLMs. This blog post captures what I learned and my journey into the fascinating world of Gemini.

Martin Andrews presenting in Google DevFest 2024 in Singapore.

About LLM and Gemini

LLM stands for Large Language Model. To most people, an LLM is like a smart friend who can answer almost all our questions with responses that are often accurate and helpful.

As a LLM, Gemini is trained on large amount of text data and can perform a wide range of tasks: answering questions, writing stories, summarising long documents, or even helping to debug code. What makes them special is their ability to “understand” and generate language in a way that feels natural to us.

Many of my developer friends have started using Gemini as a coding assistant in their IDEs. While it is good at that, Gemini is much more than just a coding tool.

Gemini is designed to not only respond to prompts but also act as an assistant with an extra set of tools. To make the most of Gemini, it is important to understand how it works and what it can (and cannot) do. With the knowledge gained from the DevFest workshop, I decided to explore how Gemini could assist with optimising relic choices in a game called Honkai: Star Rail.

Honkai: Star Rail and Gemini for Its Relic Recommendations

An HSR streamer, MurderofBirds, browsing through thousands of relics. (Image Source: MurderofBirds Twitch)

This is where LLMs like Gemini come into play. With the ability to process and analyse complex data, Gemini can help players make smarter decisions.

In this blog post, I will briefly show how this Gemini-powered relic recommendation system can analyse a player’s current characters to suggest the best options for them. Then it will also explain the logic behind its recommendations, helping us to understand why certain relics are ideal.

Setup the Project

To make my project code available to everyone, I used Google Colab, a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources, including GPUs and TPUs. You can access my code by clicking on the button below.

In my project, I used the google-generativeai Python library, which is pre-installed in Colab. This library serves as a user-friendly API for interacting with Google LLMs, including Gemini. It makes it easy for us to integrate Gemini capabilities directly into our code.

Next, we will need to import the necessary libraries.

Importing the libraries and setup Gemini client.

The first library to import is definitely the google.generativeai. Without it, we cannot interact with Gemini easily. Then we have google.colab.userdata which securely retrieves sensitive data, like our API key, directly from the Colab notebook environment.

We will also use IPython.display for displaying results in a readable format, such as Markdown.

In the Secret section, we will have two records, i.e.

HONKAI_STAR_RAIL_PLAYER_ID: Your HSR player UID. It is used later to personalise relic recommendations.
GOOGLE_API_KEY: The API key that we can get from Google AI Studio to authenticate with Gemini.

Creating and retrieving our API keys in Google AI Studio.

Once we have initialised the google.generativeai library with the GOOGLE_API_KEY, we can proceed to specify the Gemini model we will be using.

The choice of model is crucial in LLM projects. Google AI Studio offers several options, each representing a trade-off between accuracy and cost. For my project, I choose models/gemini-1.5-flash-8b-001, which provided a good balance for this experiment. Larger models might offer slightly better accuracy but at a significant cost increase.

Google AI Studio offers a range of models, from smaller, faster models suitable for quick tasks to larger, more powerful models capable of more complex processing.

Hallucination and Knowledge Limitation

We often think of LLMs like Gemini as our smart friends who can answer any question. But just like even our smartest friend can sometimes make mistakes, LLMs have their limits too.

Gemini knowledge is based on the data it was trained on, which means it doesn’t actually know everything. Sometimes, it might hallucinate, i.e. model invents information that sounds plausible but not actually true.

Kiana is not a character from Honkai: Star Rail but she is from another game called Honkai Impact 3rd.

While Gemini is trained on a massive dataset, its knowledge is not unlimited. As a responsible AI, it acknowledges its limitations. So, when it cannot find the answer, it will tell us that it lacks the necessary information rather than fabricating a response. This is how Google builds safer AI systems, as part of its Secure AI Framework (SAIF).

To overcome these constraints, we need to employ strategies to augment the capabilities of LLMs. Techniques such as integrating Retrieval-Augmented Generation (RAG) and leveraging external APIs can help bridge the gap between what the model knows and what it needs to know to perform effectively.

System Instructions

Leveraging System Instructions is a way to improve the accuracy and reliability of Gemini responses.

System instructions are prompts given before the main query in order to guide Gemini. These instructions provide crucial context and constraints, significantly enhancing the accuracy and reliability of the generated output.

System Instruction with contextual information about HSR characters ensures Gemini has the necessary background knowledge.

The specific design and phrasing of the system instructions provided to the Gemini is crucial. Effective system instructions provide Gemini with the necessary context and constraints to generate accurate and relevant responses. Without carefully crafted system instructions, even the most well-designed prompt can yield poor results.

Context Framing

As we can see from the example above, writing clear and effective system instructions requires careful thought and a lot of testing.

This is just one part of a much bigger picture called Context Framing, which includes preparing data, creating embeddings, and deciding how the system retrieves and uses that data. Each of these steps needs expertise and planning to make sure the solution works well in real-world scenarios.

You might have heard the term “Prompt Engineering,” and it sounds kind of technical, but it is really about figuring out how to ask the LLM the right questions in the right way to get the best answers from an LLM.

While context framing and prompt engineering are closely related and often overlap, they emphasise different aspects of the interaction with the LLM.

Stochasticity

While experimenting with Gemini, I noticed that even if I use the exact same prompt, the output can vary slightly each time. This happens because LLMs like Gemini have a built-in element of randomness , known as Stochasticity.

Lingsha, an HSR character released in 2024. (Image Credit: Game8)

For example, when querying for DPS characters, Lingsha was inconsistently included in the results. While this might seem like a minor variation, it underscores the probabilistic nature of LLM outputs and suggests that running multiple queries might be needed to obtain a more reliable consensus.

Lingsha was inconsistently included in the response to the query about multi-target DPS characters.

According to the official announcement, even though Lingsha is a healer, she can cause significant damage to all enemies too. (Image Source: Honkai: Star Rail YouTube)

Hence, it is important to treat writing efficient system instruction and prompt as iterative processes. so that we can experiment with different phrasings to find what works best and yields the most consistent results.

Temperature Tuning

We can also reduce the stochasticity of Gemini response through adjusting parameters like temperature. Lower temperatures typically reduce randomness, leading to more consistent outputs, but also may reduce creativity and diversity.

Temperature is an important parameter for balancing predictability and diversity in the output. Temperature, a number in the range of 0.0 to 2.0 with default to be 1.0 in gemini-1.5-flash model, indicates the probability distribution over the vocabulary in the model when generating text. Hence, a lower temperature makes the model more likely to select words with higher probabilities, resulting in more predictable and focused text.

Having Temperature=0 means that the model will always select the most likely word at each step. The output will be highly deterministic and repetitive.

Function Calls

A major limitation of using system instructions alone is their static nature.

For example, my initial system instructions included a list of HSR characters, but this list is static. The list does not include newly released characters or characters specific to the player’s account. In order to dynamically access a player’s character database and provide personalised recommendations, I integrated Function Calls to retrieve real-time data.

For fetching the player’s HSR character data, I leveraged the open-source Python library mihomo. This library provides an interface for accessing game data, enabling dynamic retrieval of a player’s characters and their attributes. This dynamic data retrieval is crucial for generating truly personalised relic recommendations.

Using the mihomo library, I retrieve five of my Starfaring Companions.

Defining the functions in my Python code was only the first step. To use function calls, Gemini needed to know which functions were available. We can provide this information to Gemini as shown below.

model = genai.GenerativeModel('models/gemini-1.5-flash-8b-001', tools=[get_player_name, get_player_starfaring_companions])

After we pass a query to a Gemini, the model returns a structured object that includes the names of relevant functions and their arguments based on the prompt, as shown in the screenshot below.

The correct function call is picked up by Gemini based on the prompt.

Using descriptive function names is essential for successful function calling with LLMs because the accuracy of function calls depends heavily on well-designed function names in our Python code. Inaccurate naming can directly impact the reliability of the entire system.

If our Python function is named incorrectly, for example, calling a function get_age but it returns the name of the person, Gemini might select that function wrongly when the prompt is asking for age.

As shown in the screenshot above, the prompt requested information about all the characters of the player. Gemini simply determines which function to call and provides the necessary arguments. Gemini does not directly execute the functions. The actual execution of the function needs to be handled by us, as demonstrated in the screenshot below.

After Gemini telling us which function to call, our code needs to call the function to get the result.

Grounding with Google Search

Function calls are a powerful way to access external data, but they require pre-defined functions and APIs.

To go beyond these limits and gather information from many online sources, we can use Gemini grounding feature with Google Search. This feature allows Gemini to google and include what it finds in its answers. This makes it easier to get up-to-date information and handle questions that need real-time data.

If you are getting the HTTP 429 errors when using the Google Search feature, please make sure you have setup a billing account here with enough quota.

With this feature enabled, we thus can ask Gemini to get some real-time data from the Internet, as shown below.

The upcoming v2.7 patch of HSR is indeed scheduled to be released on 4th December.

Building a Semantic Knowledge Base with Pinecone

System instructions and Google search grounding provide valuable context, but a structured knowledge base is needed to handle the extensive data about HSR relics.

Having explored system instructions and Google search grounding, the next challenge is to manage the extensive data about HSR relics. We need a way to store and quickly retrieve this information, enabling the system to generate timely and accurate relic recommendations. Thus we will need to use a vector database ideally suited for managing the vast dataset of relic information.

Vector databases, unlike traditional databases that rely on keyword matching, store information as vectors enabling efficient similarity searches. This allows for retrieving relevant relic sets based on the semantic meaning of a query, rather than relying solely on keywords.

There are many options for vector database, but I choose Pinecone. Pinecone, a managed service, offered the scalability needed to handle the HSR relic dataset and the robust API essential for reliable data access. Its availability of a free tier is also a significant factor because it allows me to keep costs low during the development of my project.

Pinecone’s well-documented API and straightforward SDK make integration surprisingly easy. To get started, simply follow the Pinecone documentation to install the SDK in our code and retrieve the API key.

# Import the Pinecone library
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
import time

# Initialize a Pinecone client with your API key
pc = Pinecone(api_key=userdata.get('PINECONE_API_KEY'))

I prepare my Honkai: Star Rail relic data, which I have previously organised into a JSON structure. This data includes information on each relic set’s two-piece and four-piece effects. Here’s a snippet to illustrate the format:

[
  {
    "name": "Sacerdos' Relived Ordeal",
    "two_piece": "Increases SPD by 6%",
    "four_piece": "When using Skill or Ultimate on one ally target, increases the ability-using target's CRIT DMG by 18%, lasting for 2 turn(s). This effect can stack up to 2 time(s)."
  },
  {
    "name": "Scholar Lost in Erudition",
    "two_piece": "Increases CRIT Rate by 8%",
    "four_piece": "Increases DMG dealt by Ultimate and Skill by 20%. After using Ultimate, additionally increases the DMG dealt by the next Skill by 25%."
  },
  ...
]

With the relic data organised in Pinecone, the next challenge is to enable similarity searches with vector embedding. Vector embedding captures the semantic meaning of the text, allowing Pinecone to identify similar relic sets based on their inherent properties and characteristics.

Vector embedding representations (Image Credit: Pinecode)

Now, we can generate vector embeddings for the HSR relic data using Pinecone. The following code snippet illustrates this process which is to convert textual descriptions of relic sets into numerical vector embeddings. These embeddings capture the semantic meaning of the relic set descriptions, enabling efficient similarity searches later.

# Load relic set data from the JSON file
with open('/content/hsr-relics.json', 'r') as f:
    relic_data = json.load(f)

# Prepare data for Pinecone
relic_info_data = [
    {"id": relic['name'], "text": relic['two_piece'] + " " + relic['four_piece']}  # Combine relic set descriptions
    for relic in relic_data
]

# Generate embeddings using Pinecone
embeddings = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[d['text'] for d in relic_info_data],
    parameters={"input_type": "passage", "truncate": "END"}
)

print(embeddings)

As shown in the code above, we use the multilingual-e5-large model, a text embedding model from Microsoft research, to generate a vector embedding for each relic set. The multilingual-e5-large model works well on messy data and it is good for short queries.

Pinecone ability to perform fast similarity searches relies on its indexing mechanism. Without an index, searching for similar relic sets would require comparing each relic set’s embedding vector to every other one, which would be extremely slow, especially with a large dataset. I choose Pinecone serverless index hosted on AWS for its automatic scaling and reduced infrastructure management.

# Create a serverless index
index_name = "hsr-relics-index"

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(
            cloud='aws', 
            region='us-east-1'
        ) 
    ) 

# Wait for the index to be ready
while not pc.describe_index(index_name).status['ready']:
    time.sleep(1)

The dimension parameter specifies the dimensionality of the vector embeddings. Higher dimensionality generally allows for capturing more nuanced relationships between data points. For example, two relic sets might both increase ATK, but one might also increase SPD while the other increases Crit DMG. A higher-dimensional embedding allows the system to capture these subtle distinctions, leading to more relevant recommendations.

For the metric parameter which measures the similarity between two vectors (representing relic sets), we use the cosine metric which is suitable for measuring the similarity between vector embeddings generated from text. This is crucial for understanding how similar two relic descriptions are.

With the vector embeddings generated, the next step was to upload them into my Pinecone index. Pinecone uses the upsert function to add or update vectors in the index. The following code snippet shows how we can upsert the generated embeddings into the Pinecone index.

# Target the index where you'll store the vector embeddings
index = pc.Index("hsr-relics-index")

# Prepare the records for upsert
# Each contains an 'id', the embedding 'values', and the original text as 'metadata'
records = []
for r, e in zip(relic_info_data, embeddings):
    records.append({
        "id": r['id'],
        "values": e['values'],
        "metadata": {'text': r['text']}
    })

# Upsert the records into the index
index.upsert(
    vectors=records,
    namespace="hsr-relics-namespace"
)

The code uses the zip function to iterate through both the list of prepared relic data and the list of generated embeddings simultaneously. For each pair, it creates a record for Pinecone with the following attributes.

id: Name of the relic set to ensure uniqueness;
values: The vector representing the semantic meaning of the relic set effects;
metadata: The original description of the relic effects, which will be used later for providing context to the user’s recommendations.

Implementing Similarity Search in Pinecone

With the relic data stored in Pinecone now, we can proceed to implement the similarity search functionality.

def query_pinecone(query: str) -> dict:

  # Convert the query into a numerical vector that Pinecone can search with
  query_embedding = pc.inference.embed(
      model="multilingual-e5-large",
      inputs=[query],
      parameters={
          "input_type": "query"
      }
  )

  # Search the index for the three most similar vectors
  results = index.query(
      namespace="hsr-relics-namespace",
      vector=query_embedding[0].values,
      top_k=3,
      include_values=False,
      include_metadata=True
  )

  return results

The function above takes a user’s query as input, converts it into a vector embedding using Pinecone’s inference endpoint, and then uses that embedding to search the index, returning the top three most similar relic sets along with their metadata.

Relic Recommendations with Pinecone and Gemini

With the integration with Pinecode, we design the initial prompt to pick relevant relic sets from Pinecone. After that, we take the results from Pinecone and combine them with the initial prompt to create a richer, more informative prompt for Gemini, as shown in the following code.

from google.generativeai.generative_models import GenerativeModel

async def format_pinecone_results_for_prompt(model: GenerativeModel, player_id: int) -> dict:
  character_relics_mapping = await get_player_character_relic_mapping(player_id)

  result = {}

  for character_name, (character_avatar_image_url, character_description) in character_relics_mapping.items():
    print(f"Processing Character: {character_name}")

    additional_character_data = character_profile.get(character_name, "")

    character_query = f"Suggest some good relic sets for this character: {character_description} {additional_character_data}"

    pinecone_response = query_pinecone(character_query)

    prompt = f"User Query: {character_query}\n\nRelevant Relic Sets:\n"
    for match in pinecone_response['matches']:
        prompt += f"* {match['id']}: {match['metadata']['text']}\n"  # Extract relevant data
    prompt += "\nBased on the above information, recommend two best relic sets and explain your reasoning. Each character can only equip with either one 4-piece relic or one 2-piece relic with another 2-piece relic. You cannot recommend a combination of 4-piece and 2-piece together. Consider the user's query and the characteristics of each relic set."

    response = model.generate_content(prompt)

    result[character_avatar_image_url] = response.text

  return result

The code shows that we are doing both prompt engineering (designing the initial query to get relevant relics) and context framing (combining the initial query with the retrieved relic information to get a better overall recommendation from Gemini).

First the code retrieves data about the player’s characters, including their descriptions, images, and relics the characters currently are wearing. The code then gathers potentially relevant data about each character from a separate data source character_profile which has more information, such as gameplay mechanic about the characters that we got from the Game8 Character List. With the character data, the query will find similar relic sets in the Pinecone database.

After Pinecone returns matches, the code constructs a detailed prompt for the Gemini model. This prompt includes the character’s description, relevant relic sets found by Pinecone, and crucial instructions for the model. The instructions emphasise the constraints of choosing relic sets: either a 4-piece set, or two 2-piece sets, not a mix. Importantly, it also tells Gemini to consider the character’s existing profile and to prioritise fitting relic sets.

Finally, the code sends this detailed prompt to Gemini, receiving back the recommended relic sets.

Knight of Purity Palace, is indeed a great option for Gepard!

Enviosity, a popular YouTuber known for his in-depth Honkai: Star Rail strategy guides, introduced Knight of Purity Palace for Gepard too. (Source: Enviosity YouTube)

Langtrace

Using LLMs like Gemini is sure exciting, but figuring out what is happening “under the hood” can be tricky.

If you are a web developer, you are probably familiar with Grafana dashboards. They show you how your web app is performing, highlighting areas that need improvement.

Langtrace is like Grafana, but specifically for LLMs. It gives us a similar visual overview, tracking our LLM calls, showing us where they are slow or failing, and helping us optimise the performance of our AI app.

Traces for the Gemini calls are displayed individually.

Langtrace is not only useful for tracing our LLM calls, it also offers metrics on token counts and costs, as shown in the following screenshot.

Beyond tracing calls, Langtrace collects metrics too.

Wrap-Up

Building this Honkai: Star Rail (HSR) relic recommendation system is a rewarding journey into the world of Gemini and LLMs.

I am incredibly grateful to Martin Andrews and Sam Witteveen for their inspiring Gemini Masterclass at Google DevFest in Singapore. Their guidance helped me navigate the complexities of LLM development, and I learned firsthand the importance of careful prompt engineering, the power of system instructions, and the need for dynamic data access through function calls. These lessons underscore the complexities of developing robust LLM apps and will undoubtedly inform my future AI projects.

Building this project is an enjoyable journey of learning and discovery. I encountered many challenges along the way, but overcoming them deepened my understanding of Gemini. If you’re interested in exploring the code and learning from my experiences, you can access my Colab notebook through the button below. I welcome any feedback you might have!