cuteprogramming

When Pinecone Wasn’t Enough: My Journey to pgvector

January 20, 2025January 20, 2025 by Chun Lin, posted in Cloud Computing: Amazon Web Services (AWS), Experience, Pinecone, PostgreSQL, Python, SQL

If you work with machine learning or natural language processing, you have probably dealt with storing and searching through vector embeddings.

When I created the Honkai: Star Rail (HSR) relic recommendation system using Gemini, I started with Pinecone. Pinecone is a managed vector database that made it easy to index relic descriptions and character data as embeddings. It helped me find the best recommendations based on how similar they were.

Pinecone worked well, but as the project grew, I wanted more control, something open-source, and a cheaper option. That is when I found pgvector, a tool that adds vector search to PostgreSQL and gives the flexibility of an open-source database.

About HSR and Relic Recommendation System

Honkai: Star Rail (HSR) is a popular RPG that has captured the attention of players worldwide. One of the key features of the game is its relic system, where players equip their characters with relics like hats, gloves, or boots to boost stats and unlock special abilities. Each relic has unique attributes, and selecting the right sets of relics for a character can make a huge difference in gameplay.

An HSR streamer, Unreal Dreamer, learning the new relic feature. (Image Source: Unreal Dreamer YouTube)

As a casual player, I often found myself overwhelmed by the number of options and the subtle synergies between different relic sets. Finding the good relic combination for each character was time-consuming.

This is where LLMs like Gemini come into play. With the ability to process and analyse complex data, Gemini can help players make smarter decisions.

In November 2024, I started a project to develop a Gemini-powered HSR relic recommendation system which can analyse a player’s current characters to suggest the best options for them. In the project, I have been storing embeddings in Pinecone.

Embeddings and Vector Database

An embedding is a way to turn data, like text or images, into a list of numbers called a vector. These vectors make it easier for a computer to compare and understand the relationships between different pieces of data.

For example, in the HSR relic recommendation system, we use embeddings to represent descriptions of relic sets. The numbers in the vector capture the meaning behind the words, so similar relics and characters have embeddings that are closer together in a mathematical sense.

This is where vector databases like Pinecone or pgvector come in. Vector databases are designed for performing fast similarity searches on large collections of embeddings. This is essential for building systems that need to recommend, match, or classify data.

pgvector is an open-source extension for PostgreSQL that allows us to store and search for vectors directly in our database. It adds specialised functionality for handling vector data, like embeddings in our HSR project, making it easier to perform similarity searches without needing a separate system.

Unlike managed services like Pinecone, pgvector is open source. This meant we could use it freely and avoid vendor lock-in. This is a huge advantage for developers.

Finally, since pgvector runs on PostgreSQL, there is no need for additional managed service fees. This makes it a budget-friendly option, especially for projects that need to scale without breaking the bank.

Choosing the Right Model

While the choice of the vector database is important, it is not the key factor in achieving great results. The quality of our embeddings actually is determined by the model we choose.

For my HSR relic recommendation system, when our embeddings were stored in Pinecone, I started by using the multilingual-e5-large model from Microsoft Research offered in Pinecone.

When I migrated to pgvector, I had the freedom to explore other options. For this migration, I chose the all-MiniLM-L6-v2 model hosted on Hugging Face, which is a lightweight sentence-transformer designed for semantic similarity tasks. Switching to this model allowed me to quickly generate embeddings for relic sets and integrate them into pgvector, giving me a solid starting point while leaving room for future experimentation.

Using all-MiniLM-L6-v2 Model

Once we have decided to use the all-MiniLM-L6-v2 model, the next step is to generate vector embeddings for the relic descriptions. This model is from the sentence-transformers library, so we first need to install the library.

pip install sentence-transformers

The library offers SentenceTransformer class to load pre-trained models.

from sentence_transformers import SentenceTransformer

model_name = 'all-MiniLM-L6-v2'
model = SentenceTransformer(model_name)

At this point, the model is ready to encode text into embeddings.

The SentenceTransformer model takes care of tokenisation and other preprocessing steps internally, so we can directly pass text to it.

# Function to generate embedding for a single text
def generate_embedding(text):
    # No need to tokenise separately, it's done internally
    # No need to average the token embeddings
    embeddings = model.encode(text) 

    return embeddings

In this function, when we call model.encode(text), the model processes the text through its transformer layers, generating an embedding that captures its semantic meaning. The output is already optimised for tasks like similarity search.

Setting up the Database

After generating embeddings for each relic sets using the all-MiniLM-L6-v2 model, the next step is to store them in the PostgreSQL database with the pgvector extension.

For developers using AWS, there is a good news. In May 2023, AWS announced that Amazon Relational Database Service (RDS) for PostgreSQL would be supporting pgvector. In November 2024, Amazon RDS started to support pgvector 0.8.0.

pgvector is now supported on Amazon RDS for PostgreSQL.

To install the extension, we will run the following command in our database. This will introduce a new datatype called VECTOR.

CREATE EXTENSION vector;

After this, we can define our table as follows.

CREATE TABLE IF NOT EXISTS embeddings (
    id TEXT PRIMARY KEY,
    vector VECTOR(384),
    text TEXT
);

Besides the id column which is for the unique identifier, there are two other columns that are important.

The text column stores the original text for each relic (the two-piece and four-piece bonus descriptions).

The vector column stores the embeddings. The VECTOR(384) type is used to store embeddings, and 384 here refers to the number of dimensions in the vector. In our case, the embeddings generated by the all-MiniLM-L6-v2 model are 384-dimensional, meaning each embedding will have 384 numbers.

Here, a dimension refers to one of the “features” that helps describe something. When we talk about vectors and embeddings, each dimension is just one of the many characteristics used to represent a piece of text. These features could be things like the type of words used, their relationships, and even the overall meaning of the text.

Updating the Database

After the table is created, we can proceed to create INSERT INTO SQL statements to insert the embeddings and their associated text into the database.

In this step, I load the relic information from a JSON file and process it.

import json

# Load your relic set data from a JSON file
with open('/content/hsr-relics.json', 'r') as f:
    relic_data = json.load(f)

# Prepare data
relic_info_data = [
    {"id": relic['name'], "text": relic['two_piece'] + " " + relic['four_piece']}  # Combine descriptions
    for relic in relic_data
]

The relic_info_data will then be passed to the following function to generate the INSERT INTO statements.

# Function to generate INSERT INTO statements with vectors
def generate_insert_statements(data):
    # Initialise list to store SQL statements
    insert_statements = []

    for record in data:
        # Extracting text and id from the record
        id = record.get('id')
        text = record.get('text')

        # Generate the embedding for the text
        embedding = generate_embedding(text)
        
        # Convert the embedding to a list
        embedding_list = embedding.tolist()
        
        # Create the SQL INSERT INTO statement
        sql_statement = f"""
        INSERT INTO embeddings (id, vector, text)
        VALUES (
          '{id.replace("'", "''")}', 
          ARRAY{embedding_list}, 
          '{text.replace("'", "''")}')
        ON CONFLICT (id) DO UPDATE
        SET vector = EXCLUDED.vector, text = EXCLUDED.text;
        """
        
        # Append the statement to the list
        insert_statements.append(sql_statement)

    return insert_statements

The embeddings of the relic sets are successfully inserted to the database.

How It All Fits Together: Query the Database

Once we have stored the vector embeddings of all the relic sets in our PostgreSQL database, the next step is to find the relic sets that are most similar to a given character’s relic needs.

Just like what we have done for storing relic set embeddings, we need to generate an embedding for the query describing the character’s relic needs. This is done by passing the query through the model as demonstrated in the following code.

def query_similar_embeddings(query_text):
    query_embedding = generate_embedding(query_text)

    return query_embedding.tolist()

The generated embedding is an array of 384 numbers. We simply use this array in our SQL query below.

SELECT id, text, vector <=> '[<embedding here>]' AS distance
FROM embeddings
ORDER BY distance
LIMIT 3;

The key part of the query is the <=> operator. This operator calculates the “distance” between two vectors based on cosine similarity. In our case, it measures how similar the query embedding is to each stored embedding. The smaller the distance, the more similar the embeddings are.

We use LIMIT 3 to get the top 3 most similar relic sets.

Test Case: Finding Relic Sets for Gallagher

Gallagher is a Fire and Abundance character in HSR. He is a sustain unit that can heal allies by inflicting a debuff on the enemy.

According to the official announcement, Gallagher is a healer. (Image Source: Honkai: Star Rail YouTube)

The following screenshot shows the top 3 relic sets which are closely related to a HSR character called Gallagher using the query “Suggest the best relic sets for this character: Gallagher is a Fire and Abundance character in Honkai: Star Rail. He can heal allies.”

The returned top 3 relic sets are indeed recommended for Gallagher.

One of the returned relic sets is called the “Thief of Shooting Meteor”. It is the official recommended relic set in-game, as shown in the screenshot below.

Gallagher’s official recommended relic set.

Future Work

In our project, we will not be implementing indexing because currently in HSR, there are only a small number of relic sets. Without an index, PostgreSQL will still perform vector similarity searches efficiently because the dataset is small enough that searching through it directly will not take much time. For small-scale apps like ours, querying the vector data directly is both simple and fast.

However, when our dataset grows larger in the future, it is a good idea to explore indexing options, such as the ivfflat index, to speed up similarity searches.

References

Configure Portable Object: Localisation in .NET 8 Web API

December 17, 2024January 6, 2025 by Chun Lin, posted in ASP.NET, C#, Experience, Orchard Core

Localisation is an important feature when building apps that cater to users from different countries, allowing them to interact with our app in their native language. In this article, we will walk you through how to set up and configure Portable Object (PO) Localisation in an ASP.NET Core Web API project.

Localisation is about adapting the app for a specific culture or language by translating text and customising resources. It involves translating user-facing text and content into the target language.

While .NET localisation normally uses resource files (.resx) to store localised texts for different cultures, Portable Object files (.po) are another popular choice, especially in apps that use open-source tools or frameworks.

About Portable Object (PO)

PO files are a standard format used for storing localised text. They are part of the gettext localisation framework, which is widely used across different programming ecosystems.

A PO file contains translations in the form of key-value pairs, where:

Key: The original text in the source language.
Value: The translated text in the target language.

Because PO files are simple, human-readable text files, they are easily accessible and editable by translators. This flexibility makes PO files a popular choice for many open-source projects and apps across various platforms.

You might wonder why should we use PO files instead of the traditional .resx files for localisation? Here are some advantages of using PO files instead of .resx files:

Unlike .resx files, PO files have built-in support for plural forms. This makes it much easier to handle situations where the translation changes based on the quantity, like “1 item” vs. “2 items.”
While .resx files require compilation, PO files are plain text files. Hence, we do not need any special tooling or complex build steps to use PO files.
PO files work great with collaborative translation tools. For those who are working with crowdsourcing translations, they will find that PO files are much easier to manage in these settings.

SHOW ME THE CODE!

The complete source code of this project can be found at https://github.com/goh-chunlin/Experiment.PO.

Project Setup

Let’s begin by creating a simple ASP.NET Web API project. We can start by generating a basic template with the following command.

dotnet new webapi

This will set up a minimal API with a weather forecast endpoint.

The default /weatherforecast endpoint generated by .NET Web API boilerplate.

The default endpoint in the boilerplate returns a JSON object that includes a summary field. This field describes the weather using terms like freezing, bracing, warm, or hot. Here’s the array of possible summary values:

var summaries = new[]
{
    "Freezing", "Bracing", "Chilly", "Cool", 
    "Mild", "Warm", "Balmy", "Hot", "Sweltering", "Scorching"
};

As you can see, currently, it only supports English. To extend support for multiple languages, we will introduce localisation.

Prepare PO Files

Let’s start by adding a translation for the weather summary in Chinese. Below is a sample PO file that contains the Chinese translation for the weather summaries.

#: Weather summary (Chinese)
msgid "weather_Freezing"
msgstr "寒冷"

msgid "weather_Bracing"
msgstr "冷冽"

msgid "weather_Chilly"
msgstr "凉爽"

msgid "weather_Cool"
msgstr "清爽"

msgid "weather_Mild"
msgstr "温和"

msgid "weather_Warm"
msgstr "暖和"

msgid "weather_Balmy"
msgstr "温暖"

msgid "weather_Hot"
msgstr "炎热"

msgid "weather_Sweltering"
msgstr "闷热"

msgid "weather_Scorching"
msgstr "灼热"

In most cases, PO file names are tied to locales, as they represent translations for specific languages and regions. The naming convention typically includes both the language and the region, so the system can easily identify and use the correct file. For example, the PO file above should be named zh-CN.po, which represents the Chinese translation for the China region.

In some cases, if our app supports a language without being region-specific, we could have a PO file named only with the language, such as ms.po for Malay. This serves as a fallback for all Malay speakers, regardless of their region.

We have prepared three Malay PO files: one for Malaysia (`ms-MY.po`), one for Singapore (`ms-SG.po`), and one fallback file (`ms.po`) for all Malay speakers, regardless of region.

After that, since our PO files are placed in the Localisation folder, please do not forget to include them in the .csproj file, as shown below.

<Project Sdk="Microsoft.NET.Sdk.Web">

  ...

  <ItemGroup>
    <Folder Include="Localisation\" />
    <Content Include="Localisation\**">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    </Content>
  </ItemGroup>

</Project>

Adding this <ItemGroup> ensures that the localisation files from the Localisation folder are included in our app output. This helps the application find and use the proper localisation resources when running.

Configure Localisation Option in .NET

In an ASP .NET Web API project, we have to install a NuGet library from Orchard Core called OrchardCore.Localization.Core (Version 2.1.3).

Once the package is installed, we need to tell the application where to find the PO files. This is done by configuring the localisation options in the Program.cs file.

builder.Services.AddMemoryCache();
builder.Services.AddPortableObjectLocalization(options => 
    options.ResourcesPath = "Localisation");

The AddMemoryCache method is necessary here because LocalizationManager of Orchard Core uses the IMemoryCache service. This caching mechanism helps avoid repeatedly parsing and loading the PO files, improving performance by keeping the localised resources in memory.

Supported Cultures and Default Culture

Now, we need to configure how the application will select the appropriate culture for incoming requests.

In .NET, we need to specify which cultures our app supports. While .NET is capable of supporting multiple cultures out of the box, it still needs to know which specific cultures we are willing to support. By defining only the cultures we actually support, we can avoid unnecessary overhead and ensure that our app is optimised.

We have two separate things to manage when making an app available in different languages and regions in .NET:

SupportedCultures: This is about how the app displays numbers, dates, and currencies. For example, how a date is shown (like MM/dd/yyyy in the US);
SupportedUICultures: This is where we specify the languages our app supports for displaying text (the content inside the PO files).

To keep things consistent and handle both text translations and regional formatting properly, it is a good practice to configure both SupportedCultures and SupportedUICultures.

We also need to setup the DefaultRequestCulture. It is the fallback culture that our app uses when it does not have any explicit culture information from the request.

The following code shows how we configure all these. To make our demo simple, we assume the locale that user wants is passed via query string.

builder.Services.Configure<RequestLocalizationOptions>(options =>
{
    var supportedCultures = LocaleConstants.SupportedAppLocale
        .Select(cul => new CultureInfo(cul))
        .ToArray();

    options.DefaultRequestCulture = new RequestCulture(
        culture: "en", uiCulture: "en");
    options.SupportedCultures = supportedCultures;
    options.SupportedUICultures = supportedCultures;
    options.AddInitialRequestCultureProvider(
        new CustomRequestCultureProvider(async httpContext =>
        {
            var currentCulture = 
                CultureInfo.InvariantCulture.Name;
            var requestUrlPath = 
                httpContext.Request.Path.Value;

            if (httpContext.Request.Query.ContainsKey("locale"))
            {
                currentCulture =         
httpContext.Request.Query["locale"].ToString();
            }

            return await Task.FromResult(
                new ProviderCultureResult(currentCulture));
        })
    );
});

Next, we need to add the RequestLocalizationMiddleware in Program.cs to automatically set culture information for requests based on information provided by the client.

app.UseRequestLocalization();

After setting up the RequestLocalizationMiddleware, we can now move on to localising the API endpoint by using IStringLocalizer to retrieve translated text based on the culture information set for the current request.

About IStringLocalizer

IStringLocalizer is a service in ASP.NET Core used for retrieving localised resources, such as strings, based on the current culture of our app. In essence, IStringLocalizer acts as a bridge between our code and the language resources (like PO files) that contain translations. If the localised value of a key is not found, then the indexer key is returned.

We first need to inject IStringLocalizer into our API controllers or any services where we want to retrieve localised text.

app.MapGet("/weatherforecast", (IStringLocalizer<WeatherForecast> stringLocalizer) =>
{
    var forecast =  Enumerable.Range(1, 5).Select(index =>
        new WeatherForecast
        (
            DateOnly.FromDateTime(DateTime.Now.AddDays(index)),
            Random.Shared.Next(-20, 55),
            stringLocalizer["weather_" + summaries[Random.Shared.Next(summaries.Length)]]
        ))
        .ToArray();
    return forecast;
})
.WithName("GetWeatherForecast")
.WithOpenApi();

The reason we use IStringLocalizer<WeatherForecast> instead of just IStringLocalizer is because we are relying on Orchard Core package to handle the PO files. According to Sebastian Ros, the Orchard Core maintainer, we cannot resolve IStringLocalizer, we need IStringLocalizer<T>. When we use IStringLocalizer<T> instead of just IStringLocalizer is also related to how localisation is typically scoped in .NET applications.

Running on Localhost

Now, if we run the project using dotnet run, the Web API should compile successfully. Once the API is running on localhost, visiting the endpoint with zh-CN as the locale should return the weather summary in Chinese, as shown in the screenshot below.

The summary is getting the translated text from zh-CN.po now.

Dockerisation

Since the Web API is tested to be working, we can proceed to dockerise it.

We will first create a Dockerfile as shown below to define the environment our Web API will run in. Then we will build the Docker image, using the Dockerfile. After building the image, we will run it in a container, making our Web API available for use.

## Build Container
FROM mcr.microsoft.com/dotnet/sdk:8.0-alpine AS builder
WORKDIR /app

# Copy the project file and restore any dependencies (use .csproj for the project name)
COPY *.csproj ./
RUN dotnet restore

# Copy the rest of the application code
COPY . .

# Publish the application
RUN dotnet publish -c Release -o out

## Runtime Container
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime

ENV ASPNETCORE_URLS=http://*:80

WORKDIR /app
COPY --from=builder /app/out ./

# Expose the port your application will run on
EXPOSE 80

ENTRYPOINT ["dotnet", "Experiment.PO.dll"]

As shown in the Dockerfile, we are using .NET Alpine images. Alpine is a lightweight Linux distribution often used in Docker images because it is much smaller than other base images. It is a best practice when we want a minimal image with fewer security vulnerabilities and faster performance.

Globalisation Invariant Mode in .NET

When we run our Web API as a Docker container on our local machine, we will soon realise that our container has stopped because our Web API inside it crashed. It turns out that there is an exception called System.Globalization.CultureNotFoundException.

Our Web API crashes due to System.Globalization.CultureNotFoundException, as shown in `docker logs`.

As pointed out in the error message, only the invariant culture is supported in globalization-invariant mode.

The globalization-invariant mode was introduced in .NET 2.0 in 2017. It allows our apps to run without using the full globalization data, which can significantly reduce the runtime size and improve the performance of our application, especially in environments like Docker or microservices.

In globalization-invariant mode, only the invariant culture is used. This culture is based on English (United States) but it is not specifically tied to en-US. It is just a neutral culture used to ensure consistent behaviour across environments.

Before .NET 6, globalization-invariant mode allowed us to create any custom culture, as long as its name conformed to the BCP-47 standard. BCP-47 stands for Best Current Practice 47, and it defines a way to represent language tags that include the language, region, and other relevant cultural data. A BCP-47 language tag typically follows this pattern: language-region, for example zh-CN and zh-Hans.

Thus, before .NET 6, if an app creates a culture that is not the invariant culture, the operation succeeds.

However, starting from .NET 6, an exception is thrown if we create any culture other than the invariant culture in globalization-invariant mode. This explains why our app throws System.Globalization.CultureNotFoundException.

We thus need to disable the globalization-invariant mode in the .csproj file, as shown below, so that we can use the full globalization data, which will allow .NET to properly handle localisation.

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    ...
    <InvariantGlobalization>false</InvariantGlobalization>
  </PropertyGroup>

  ...

</Project>

Missing of ICU in Alpine

Since Alpine is a very minimal Linux distribution, it does not include many libraries, tools, or system components that are present in more standard distributions like Ubuntu.

In terms of globalisation, Alpine does not come pre-installed with ICU (International Components for Unicode), which .NET uses for localisation in our case.

Hence, after we turned off the globalization-invariant mode, we will encounter another issue, which is our Web API not being able to locate a valid ICU package.

Our Web API crashes due to the missing of ICU package, as shown in `docker logs`.

As suggested in the error message, we need to install the ICU libraries (icu-libs).

In .NET, icu-libs provides the necessary ICU libraries that allow our Web API to handle globalisation. However, the ICU libraries rely on culture-specific data to function correctly. This culture-specific data is provided by icu-data-full, which includes the full set of localisation and globalisation data for different languages and regions. Therefore, we need to install both icu-libs and icu-data-full, as shown below.

...

## Runtime Container
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime

# Install cultures
RUN apk add --no-cache \
   icu-data-full \
   icu-libs

...

After installing the ICU libraries, our weather forecast Web API container should be running successfully now. Now, when we visit the endpoint, we will realise that it is able to retrieve the correct value from the PO files, as shown in the following screenshot.

Yay, we can get the translated texts now!

One last thing I would like to share is that, as shown in the screenshot above, since we do not have a PO file for ms-BN (Malay for Brunei), the fallback mechanism automatically uses the ms.po file instead.

Additional Configuration

If you still could not get the translation with PO files to work, perhaps you can try out some of the suggestions from my teammates below.

Firstly, you may need to setup the AppLocalIcu in .csproj file. This setting is used to specify whether the app should use a local copy of ICU or rely on the system-installed ICU libraries. This is particularly useful in containerised environments like Docker.

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    ...
    <AppLocalIcu>true</AppLocalIcu>
  </PropertyGroup>

</Project>

Secondly, even though we have installed icu-libs and icu-data-full in our Alpine container, some .NET apps rely on data beyond just having the libraries available. In such case, we need to turn on the IncludeNativeLibrariesForSelfExtract setting as well in .csproj.

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    ...
    <IncludeNativeLibrariesForSelfExtract>true</IncludeNativeLibrariesForSelfExtract>
  </PropertyGroup>

</Project>

Thirdly, please check if you need to configure DOTNET_SYSTEM_GLOBALIZATION_PREDEFINED_CULTURES_ONLY as well. However, please take note that this setting only makes sense when when globalization-invariant mode is enabled.

Finally, you may also need to include the runtime ICU libraries with the Microsoft.ICU.ICU4C.Runtime NuGet package (Version 72.1.0.3), enabling your app to use culture-specific data for globalisation features.

References

From Zero to Gemini: Building an AI-Powered Game Helper

December 8, 2024January 19, 2025 by Chun Lin, posted in Cloud Computing: Google Cloud Platform, Event, Experience, Gemini, GenAI, Langtrace, Pinecone, Python

On a chilly November morning, I attended the Google DevFest 2024 in Singapore. Together with my friends, we attended a workshop titled “Gemini Masterclass: How to Unlock Its Power with Prompting, Functions, and Agents.” The session was led by two incredible speakers, Martin Andrews and Sam Witteveen.

Martin, who holds a PhD in Machine Learning and has been an Open Source advocate since 1999. Sam is a Google Developer Expert in Machine Learning. Both of them are also organisers of the Machine Learning Singapore Meetup group. Together, they delivered an engaging and hands-on workshop about Gemini, the advanced LLM from Google.

Thanks to their engaging Gemini Masterclass, I have taken my first steps into the world of LLMs. This blog post captures what I learned and my journey into the fascinating world of Gemini.

Martin Andrews presenting in Google DevFest 2024 in Singapore.

About LLM and Gemini

LLM stands for Large Language Model. To most people, an LLM is like a smart friend who can answer almost all our questions with responses that are often accurate and helpful.

As a LLM, Gemini is trained on large amount of text data and can perform a wide range of tasks: answering questions, writing stories, summarising long documents, or even helping to debug code. What makes them special is their ability to “understand” and generate language in a way that feels natural to us.

Many of my developer friends have started using Gemini as a coding assistant in their IDEs. While it is good at that, Gemini is much more than just a coding tool.

Gemini is designed to not only respond to prompts but also act as an assistant with an extra set of tools. To make the most of Gemini, it is important to understand how it works and what it can (and cannot) do. With the knowledge gained from the DevFest workshop, I decided to explore how Gemini could assist with optimising relic choices in a game called Honkai: Star Rail.

Honkai: Star Rail and Gemini for Its Relic Recommendations

An HSR streamer, MurderofBirds, browsing through thousands of relics. (Image Source: MurderofBirds Twitch)

This is where LLMs like Gemini come into play. With the ability to process and analyse complex data, Gemini can help players make smarter decisions.

In this blog post, I will briefly show how this Gemini-powered relic recommendation system can analyse a player’s current characters to suggest the best options for them. Then it will also explain the logic behind its recommendations, helping us to understand why certain relics are ideal.

Setup the Project

To make my project code available to everyone, I used Google Colab, a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources, including GPUs and TPUs. You can access my code by clicking on the button below.

In my project, I used the google-generativeai Python library, which is pre-installed in Colab. This library serves as a user-friendly API for interacting with Google LLMs, including Gemini. It makes it easy for us to integrate Gemini capabilities directly into our code.

Next, we will need to import the necessary libraries.

Importing the libraries and setup Gemini client.

The first library to import is definitely the google.generativeai. Without it, we cannot interact with Gemini easily. Then we have google.colab.userdata which securely retrieves sensitive data, like our API key, directly from the Colab notebook environment.

We will also use IPython.display for displaying results in a readable format, such as Markdown.

In the Secret section, we will have two records, i.e.

HONKAI_STAR_RAIL_PLAYER_ID: Your HSR player UID. It is used later to personalise relic recommendations.
GOOGLE_API_KEY: The API key that we can get from Google AI Studio to authenticate with Gemini.

Creating and retrieving our API keys in Google AI Studio.

Once we have initialised the google.generativeai library with the GOOGLE_API_KEY, we can proceed to specify the Gemini model we will be using.

The choice of model is crucial in LLM projects. Google AI Studio offers several options, each representing a trade-off between accuracy and cost. For my project, I choose models/gemini-1.5-flash-8b-001, which provided a good balance for this experiment. Larger models might offer slightly better accuracy but at a significant cost increase.

Google AI Studio offers a range of models, from smaller, faster models suitable for quick tasks to larger, more powerful models capable of more complex processing.

Hallucination and Knowledge Limitation

We often think of LLMs like Gemini as our smart friends who can answer any question. But just like even our smartest friend can sometimes make mistakes, LLMs have their limits too.

Gemini knowledge is based on the data it was trained on, which means it doesn’t actually know everything. Sometimes, it might hallucinate, i.e. model invents information that sounds plausible but not actually true.

Kiana is not a character from Honkai: Star Rail but she is from another game called Honkai Impact 3rd.

While Gemini is trained on a massive dataset, its knowledge is not unlimited. As a responsible AI, it acknowledges its limitations. So, when it cannot find the answer, it will tell us that it lacks the necessary information rather than fabricating a response. This is how Google builds safer AI systems, as part of its Secure AI Framework (SAIF).

To overcome these constraints, we need to employ strategies to augment the capabilities of LLMs. Techniques such as integrating Retrieval-Augmented Generation (RAG) and leveraging external APIs can help bridge the gap between what the model knows and what it needs to know to perform effectively.

System Instructions

Leveraging System Instructions is a way to improve the accuracy and reliability of Gemini responses.

System instructions are prompts given before the main query in order to guide Gemini. These instructions provide crucial context and constraints, significantly enhancing the accuracy and reliability of the generated output.

System Instruction with contextual information about HSR characters ensures Gemini has the necessary background knowledge.

The specific design and phrasing of the system instructions provided to the Gemini is crucial. Effective system instructions provide Gemini with the necessary context and constraints to generate accurate and relevant responses. Without carefully crafted system instructions, even the most well-designed prompt can yield poor results.

Context Framing

As we can see from the example above, writing clear and effective system instructions requires careful thought and a lot of testing.

This is just one part of a much bigger picture called Context Framing, which includes preparing data, creating embeddings, and deciding how the system retrieves and uses that data. Each of these steps needs expertise and planning to make sure the solution works well in real-world scenarios.

You might have heard the term “Prompt Engineering,” and it sounds kind of technical, but it is really about figuring out how to ask the LLM the right questions in the right way to get the best answers from an LLM.

While context framing and prompt engineering are closely related and often overlap, they emphasise different aspects of the interaction with the LLM.

Stochasticity

While experimenting with Gemini, I noticed that even if I use the exact same prompt, the output can vary slightly each time. This happens because LLMs like Gemini have a built-in element of randomness , known as Stochasticity.

Lingsha, an HSR character released in 2024. (Image Credit: Game8)

For example, when querying for DPS characters, Lingsha was inconsistently included in the results. While this might seem like a minor variation, it underscores the probabilistic nature of LLM outputs and suggests that running multiple queries might be needed to obtain a more reliable consensus.

Lingsha was inconsistently included in the response to the query about multi-target DPS characters.

According to the official announcement, even though Lingsha is a healer, she can cause significant damage to all enemies too. (Image Source: Honkai: Star Rail YouTube)

Hence, it is important to treat writing efficient system instruction and prompt as iterative processes. so that we can experiment with different phrasings to find what works best and yields the most consistent results.

Temperature Tuning

We can also reduce the stochasticity of Gemini response through adjusting parameters like temperature. Lower temperatures typically reduce randomness, leading to more consistent outputs, but also may reduce creativity and diversity.

Temperature is an important parameter for balancing predictability and diversity in the output. Temperature, a number in the range of 0.0 to 2.0 with default to be 1.0 in gemini-1.5-flash model, indicates the probability distribution over the vocabulary in the model when generating text. Hence, a lower temperature makes the model more likely to select words with higher probabilities, resulting in more predictable and focused text.

Having Temperature=0 means that the model will always select the most likely word at each step. The output will be highly deterministic and repetitive.

Function Calls

A major limitation of using system instructions alone is their static nature.

For example, my initial system instructions included a list of HSR characters, but this list is static. The list does not include newly released characters or characters specific to the player’s account. In order to dynamically access a player’s character database and provide personalised recommendations, I integrated Function Calls to retrieve real-time data.

For fetching the player’s HSR character data, I leveraged the open-source Python library mihomo. This library provides an interface for accessing game data, enabling dynamic retrieval of a player’s characters and their attributes. This dynamic data retrieval is crucial for generating truly personalised relic recommendations.

Using the mihomo library, I retrieve five of my Starfaring Companions.

Defining the functions in my Python code was only the first step. To use function calls, Gemini needed to know which functions were available. We can provide this information to Gemini as shown below.

model = genai.GenerativeModel('models/gemini-1.5-flash-8b-001', tools=[get_player_name, get_player_starfaring_companions])

After we pass a query to a Gemini, the model returns a structured object that includes the names of relevant functions and their arguments based on the prompt, as shown in the screenshot below.

The correct function call is picked up by Gemini based on the prompt.

Using descriptive function names is essential for successful function calling with LLMs because the accuracy of function calls depends heavily on well-designed function names in our Python code. Inaccurate naming can directly impact the reliability of the entire system.

If our Python function is named incorrectly, for example, calling a function get_age but it returns the name of the person, Gemini might select that function wrongly when the prompt is asking for age.

As shown in the screenshot above, the prompt requested information about all the characters of the player. Gemini simply determines which function to call and provides the necessary arguments. Gemini does not directly execute the functions. The actual execution of the function needs to be handled by us, as demonstrated in the screenshot below.

After Gemini telling us which function to call, our code needs to call the function to get the result.

Grounding with Google Search

Function calls are a powerful way to access external data, but they require pre-defined functions and APIs.

To go beyond these limits and gather information from many online sources, we can use Gemini grounding feature with Google Search. This feature allows Gemini to google and include what it finds in its answers. This makes it easier to get up-to-date information and handle questions that need real-time data.

If you are getting the HTTP 429 errors when using the Google Search feature, please make sure you have setup a billing account here with enough quota.

With this feature enabled, we thus can ask Gemini to get some real-time data from the Internet, as shown below.

The upcoming v2.7 patch of HSR is indeed scheduled to be released on 4th December.

Building a Semantic Knowledge Base with Pinecone

System instructions and Google search grounding provide valuable context, but a structured knowledge base is needed to handle the extensive data about HSR relics.

Having explored system instructions and Google search grounding, the next challenge is to manage the extensive data about HSR relics. We need a way to store and quickly retrieve this information, enabling the system to generate timely and accurate relic recommendations. Thus we will need to use a vector database ideally suited for managing the vast dataset of relic information.

Vector databases, unlike traditional databases that rely on keyword matching, store information as vectors enabling efficient similarity searches. This allows for retrieving relevant relic sets based on the semantic meaning of a query, rather than relying solely on keywords.

There are many options for vector database, but I choose Pinecone. Pinecone, a managed service, offered the scalability needed to handle the HSR relic dataset and the robust API essential for reliable data access. Its availability of a free tier is also a significant factor because it allows me to keep costs low during the development of my project.

Pinecone’s well-documented API and straightforward SDK make integration surprisingly easy. To get started, simply follow the Pinecone documentation to install the SDK in our code and retrieve the API key.

# Import the Pinecone library
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
import time

# Initialize a Pinecone client with your API key
pc = Pinecone(api_key=userdata.get('PINECONE_API_KEY'))

I prepare my Honkai: Star Rail relic data, which I have previously organised into a JSON structure. This data includes information on each relic set’s two-piece and four-piece effects. Here’s a snippet to illustrate the format:

[
  {
    "name": "Sacerdos' Relived Ordeal",
    "two_piece": "Increases SPD by 6%",
    "four_piece": "When using Skill or Ultimate on one ally target, increases the ability-using target's CRIT DMG by 18%, lasting for 2 turn(s). This effect can stack up to 2 time(s)."
  },
  {
    "name": "Scholar Lost in Erudition",
    "two_piece": "Increases CRIT Rate by 8%",
    "four_piece": "Increases DMG dealt by Ultimate and Skill by 20%. After using Ultimate, additionally increases the DMG dealt by the next Skill by 25%."
  },
  ...
]

With the relic data organised in Pinecone, the next challenge is to enable similarity searches with vector embedding. Vector embedding captures the semantic meaning of the text, allowing Pinecone to identify similar relic sets based on their inherent properties and characteristics.

Vector embedding representations (Image Credit: Pinecode)

Now, we can generate vector embeddings for the HSR relic data using Pinecone. The following code snippet illustrates this process which is to convert textual descriptions of relic sets into numerical vector embeddings. These embeddings capture the semantic meaning of the relic set descriptions, enabling efficient similarity searches later.

# Load relic set data from the JSON file
with open('/content/hsr-relics.json', 'r') as f:
    relic_data = json.load(f)

# Prepare data for Pinecone
relic_info_data = [
    {"id": relic['name'], "text": relic['two_piece'] + " " + relic['four_piece']}  # Combine relic set descriptions
    for relic in relic_data
]

# Generate embeddings using Pinecone
embeddings = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[d['text'] for d in relic_info_data],
    parameters={"input_type": "passage", "truncate": "END"}
)

print(embeddings)

As shown in the code above, we use the multilingual-e5-large model, a text embedding model from Microsoft research, to generate a vector embedding for each relic set. The multilingual-e5-large model works well on messy data and it is good for short queries.

Pinecone ability to perform fast similarity searches relies on its indexing mechanism. Without an index, searching for similar relic sets would require comparing each relic set’s embedding vector to every other one, which would be extremely slow, especially with a large dataset. I choose Pinecone serverless index hosted on AWS for its automatic scaling and reduced infrastructure management.

# Create a serverless index
index_name = "hsr-relics-index"

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(
            cloud='aws', 
            region='us-east-1'
        ) 
    ) 

# Wait for the index to be ready
while not pc.describe_index(index_name).status['ready']:
    time.sleep(1)

The dimension parameter specifies the dimensionality of the vector embeddings. Higher dimensionality generally allows for capturing more nuanced relationships between data points. For example, two relic sets might both increase ATK, but one might also increase SPD while the other increases Crit DMG. A higher-dimensional embedding allows the system to capture these subtle distinctions, leading to more relevant recommendations.

For the metric parameter which measures the similarity between two vectors (representing relic sets), we use the cosine metric which is suitable for measuring the similarity between vector embeddings generated from text. This is crucial for understanding how similar two relic descriptions are.

With the vector embeddings generated, the next step was to upload them into my Pinecone index. Pinecone uses the upsert function to add or update vectors in the index. The following code snippet shows how we can upsert the generated embeddings into the Pinecone index.

# Target the index where you'll store the vector embeddings
index = pc.Index("hsr-relics-index")

# Prepare the records for upsert
# Each contains an 'id', the embedding 'values', and the original text as 'metadata'
records = []
for r, e in zip(relic_info_data, embeddings):
    records.append({
        "id": r['id'],
        "values": e['values'],
        "metadata": {'text': r['text']}
    })

# Upsert the records into the index
index.upsert(
    vectors=records,
    namespace="hsr-relics-namespace"
)

The code uses the zip function to iterate through both the list of prepared relic data and the list of generated embeddings simultaneously. For each pair, it creates a record for Pinecone with the following attributes.

id: Name of the relic set to ensure uniqueness;
values: The vector representing the semantic meaning of the relic set effects;
metadata: The original description of the relic effects, which will be used later for providing context to the user’s recommendations.

Implementing Similarity Search in Pinecone

With the relic data stored in Pinecone now, we can proceed to implement the similarity search functionality.

def query_pinecone(query: str) -> dict:

  # Convert the query into a numerical vector that Pinecone can search with
  query_embedding = pc.inference.embed(
      model="multilingual-e5-large",
      inputs=[query],
      parameters={
          "input_type": "query"
      }
  )

  # Search the index for the three most similar vectors
  results = index.query(
      namespace="hsr-relics-namespace",
      vector=query_embedding[0].values,
      top_k=3,
      include_values=False,
      include_metadata=True
  )

  return results

The function above takes a user’s query as input, converts it into a vector embedding using Pinecone’s inference endpoint, and then uses that embedding to search the index, returning the top three most similar relic sets along with their metadata.

Relic Recommendations with Pinecone and Gemini

With the integration with Pinecode, we design the initial prompt to pick relevant relic sets from Pinecone. After that, we take the results from Pinecone and combine them with the initial prompt to create a richer, more informative prompt for Gemini, as shown in the following code.

from google.generativeai.generative_models import GenerativeModel

async def format_pinecone_results_for_prompt(model: GenerativeModel, player_id: int) -> dict:
  character_relics_mapping = await get_player_character_relic_mapping(player_id)

  result = {}

  for character_name, (character_avatar_image_url, character_description) in character_relics_mapping.items():
    print(f"Processing Character: {character_name}")

    additional_character_data = character_profile.get(character_name, "")

    character_query = f"Suggest some good relic sets for this character: {character_description} {additional_character_data}"

    pinecone_response = query_pinecone(character_query)

    prompt = f"User Query: {character_query}\n\nRelevant Relic Sets:\n"
    for match in pinecone_response['matches']:
        prompt += f"* {match['id']}: {match['metadata']['text']}\n"  # Extract relevant data
    prompt += "\nBased on the above information, recommend two best relic sets and explain your reasoning. Each character can only equip with either one 4-piece relic or one 2-piece relic with another 2-piece relic. You cannot recommend a combination of 4-piece and 2-piece together. Consider the user's query and the characteristics of each relic set."

    response = model.generate_content(prompt)

    result[character_avatar_image_url] = response.text

  return result

The code shows that we are doing both prompt engineering (designing the initial query to get relevant relics) and context framing (combining the initial query with the retrieved relic information to get a better overall recommendation from Gemini).

First the code retrieves data about the player’s characters, including their descriptions, images, and relics the characters currently are wearing. The code then gathers potentially relevant data about each character from a separate data source character_profile which has more information, such as gameplay mechanic about the characters that we got from the Game8 Character List. With the character data, the query will find similar relic sets in the Pinecone database.

After Pinecone returns matches, the code constructs a detailed prompt for the Gemini model. This prompt includes the character’s description, relevant relic sets found by Pinecone, and crucial instructions for the model. The instructions emphasise the constraints of choosing relic sets: either a 4-piece set, or two 2-piece sets, not a mix. Importantly, it also tells Gemini to consider the character’s existing profile and to prioritise fitting relic sets.

Finally, the code sends this detailed prompt to Gemini, receiving back the recommended relic sets.

Knight of Purity Palace, is indeed a great option for Gepard!

Enviosity, a popular YouTuber known for his in-depth Honkai: Star Rail strategy guides, introduced Knight of Purity Palace for Gepard too. (Source: Enviosity YouTube)

Langtrace

Using LLMs like Gemini is sure exciting, but figuring out what is happening “under the hood” can be tricky.

If you are a web developer, you are probably familiar with Grafana dashboards. They show you how your web app is performing, highlighting areas that need improvement.

Langtrace is like Grafana, but specifically for LLMs. It gives us a similar visual overview, tracking our LLM calls, showing us where they are slow or failing, and helping us optimise the performance of our AI app.

Traces for the Gemini calls are displayed individually.

Langtrace is not only useful for tracing our LLM calls, it also offers metrics on token counts and costs, as shown in the following screenshot.

Beyond tracing calls, Langtrace collects metrics too.

Wrap-Up

Building this Honkai: Star Rail (HSR) relic recommendation system is a rewarding journey into the world of Gemini and LLMs.

I am incredibly grateful to Martin Andrews and Sam Witteveen for their inspiring Gemini Masterclass at Google DevFest in Singapore. Their guidance helped me navigate the complexities of LLM development, and I learned firsthand the importance of careful prompt engineering, the power of system instructions, and the need for dynamic data access through function calls. These lessons underscore the complexities of developing robust LLM apps and will undoubtedly inform my future AI projects.

Building this project is an enjoyable journey of learning and discovery. I encountered many challenges along the way, but overcoming them deepened my understanding of Gemini. If you’re interested in exploring the code and learning from my experiences, you can access my Colab notebook through the button below. I welcome any feedback you might have!

References

[KOSD] Change of FromQuery Model Binding from .NET 6 to .NET8

October 31, 2024 by Chun Lin, posted in ASP.NET, C#, Core, Experience

Recently, while migrating our project from .NET 6 to .NET 8, my teammate Jeremy Chan uncovered an undocumented change in model binding behaviour that seems to appear since .NET 7. This change is not clearly explained in the official .NET documentation, so it can be something developers easily overlook.

To illustrate the issue, let’s begin with a simple Web API project and explore a straightforward controller method that highlights the change.

[ApiController]
public class FooController
{
  [HttpGet()]
  public async void Get([FromQuery] string value = "Hello")
  {
    Console.WriteLine($"Value is {value}");
 
    return new JsonResult() { StatusCode = StatusCodes.Status200OK };
  }
}

Then we assume that we have nullable enabled in both .NET 6 and .NET 8 projects.

<Project Sdk="Microsoft.NET.Sdk.Web">
 
    <PropertyGroup>
        <Nullable>enable</Nullable>
        ...
    </PropertyGroup>
 
    ...
 
</Project>

Situation in .NET 6

In .NET 6, when we call the endpoint with /foo?value=, we shall receive the following error.

{
  "type": "https://tools.ietf.org/html/rfc7231#section-6.5.1",
  "title": "One or more validation errors occurred.",
  "status": 400,
  "traceId": "00-5bc66c755994b2bba7c9d2337c1e5bc4-e116fa61d942199b-00",
  "errors": {
    "value": [
      "The value field is required."
    ]
  }
}

However, if we change the method to be as follows, the error will not be there.

public async void Get([FromQuery] string? value)
{
    if (value is null)
        Console.WriteLine($"Value is null!!!");
    else
        Console.WriteLine($"Value is {value}");
 
    return new JsonResult() { StatusCode = StatusCodes.Status200OK };
}

The log when calling the endpoint with /foo?value= will then be “Value is null!!!”.

Hence, we can know that query string without value will be interpreted as being null. That is why there will be a validation error when value is not nullable.

Thus, we can say that, in order to make the endpoint work in .NET 6, we need to change it to be as follows to make the value optional. This will not mark value as a required field.

public async void Get([FromQuery] string? value = "Hello")

Now, if we call the endpoint with /foo?value=, we shall receive see the log “Value is Hello” printed.

Situation in .NET 8 (and .NET 7)

Then how about in .NET 8 with the same original setup, i.e. as shown below.

public async void Get([FromQuery] string value = "Hello")

In .NET 8, when we call the endpoint with /foo?value=, we shall see the log “Value is Hello” printed.

So, what is happening here?

In .NET 7, a new Interface IParsable<TSelf> was introduced. Thus, starting from the .NET 7, IParsable<TSelf>.TryParse API is used for binding controller action parameter values.

Initial research shows that, under the hood, .NET 7 onwards, the new model binding implementation is used and it causes this to happen.

References

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.