When Pinecone Wasn’t Enough: My Journey to pgvector

If you work with machine learning or natural language processing, you have probably dealt with storing and searching through vector embeddings.

When I created the Honkai: Star Rail (HSR) relic recommendation system using Gemini, I started with Pinecone. Pinecone is a managed vector database that made it easy to index relic descriptions and character data as embeddings. It helped me find the best recommendations based on how similar they were.

Pinecone worked well, but as the project grew, I wanted more control, something open-source, and a cheaper option. That is when I found pgvector, a tool that adds vector search to PostgreSQL and gives the flexibility of an open-source database.

About HSR and Relic Recommendation System

Honkai: Star Rail (HSR) is a popular RPG that has captured the attention of players worldwide. One of the key features of the game is its relic system, where players equip their characters with relics like hats, gloves, or boots to boost stats and unlock special abilities. Each relic has unique attributes, and selecting the right sets of relics for a character can make a huge difference in gameplay.

An HSR streamer, Unreal Dreamer, learning the new relic feature. (Image Source: Unreal Dreamer YouTube)

As a casual player, I often found myself overwhelmed by the number of options and the subtle synergies between different relic sets. Finding the good relic combination for each character was time-consuming.

This is where LLMs like Gemini come into play. With the ability to process and analyse complex data, Gemini can help players make smarter decisions.

In November 2024, I started a project to develop a Gemini-powered HSR relic recommendation system which can analyse a player’s current characters to suggest the best options for them. In the project, I have been storing embeddings in Pinecone.

Embeddings and Vector Database

An embedding is a way to turn data, like text or images, into a list of numbers called a vector. These vectors make it easier for a computer to compare and understand the relationships between different pieces of data.

For example, in the HSR relic recommendation system, we use embeddings to represent descriptions of relic sets. The numbers in the vector capture the meaning behind the words, so similar relics and characters have embeddings that are closer together in a mathematical sense.

This is where vector databases like Pinecone or pgvector come in. Vector databases are designed for performing fast similarity searches on large collections of embeddings. This is essential for building systems that need to recommend, match, or classify data.

pgvector is an open-source extension for PostgreSQL that allows us to store and search for vectors directly in our database. It adds specialised functionality for handling vector data, like embeddings in our HSR project, making it easier to perform similarity searches without needing a separate system.

Unlike managed services like Pinecone, pgvector is open source. This meant we could use it freely and avoid vendor lock-in. This is a huge advantage for developers.

Finally, since pgvector runs on PostgreSQL, there is no need for additional managed service fees. This makes it a budget-friendly option, especially for projects that need to scale without breaking the bank.

Choosing the Right Model

While the choice of the vector database is important, it is not the key factor in achieving great results. The quality of our embeddings actually is determined by the model we choose.

For my HSR relic recommendation system, when our embeddings were stored in Pinecone, I started by using the multilingual-e5-large model from Microsoft Research offered in Pinecone.

When I migrated to pgvector, I had the freedom to explore other options. For this migration, I chose the all-MiniLM-L6-v2 model hosted on Hugging Face, which is a lightweight sentence-transformer designed for semantic similarity tasks. Switching to this model allowed me to quickly generate embeddings for relic sets and integrate them into pgvector, giving me a solid starting point while leaving room for future experimentation.

The all-MiniLM-L6-v2 model hosted on Hugging Face.

Using all-MiniLM-L6-v2 Model

Once we have decided to use the all-MiniLM-L6-v2 model, the next step is to generate vector embeddings for the relic descriptions. This model is from the sentence-transformers library, so we first need to install the library.

pip install sentence-transformers

The library offers SentenceTransformer class to load pre-trained models.

from sentence_transformers import SentenceTransformer

model_name = 'all-MiniLM-L6-v2'
model = SentenceTransformer(model_name)

At this point, the model is ready to encode text into embeddings.

The SentenceTransformer model takes care of tokenisation and other preprocessing steps internally, so we can directly pass text to it.

# Function to generate embedding for a single text
def generate_embedding(text):
# No need to tokenise separately, it's done internally
# No need to average the token embeddings
embeddings = model.encode(text)

return embeddings

In this function, when we call model.encode(text), the model processes the text through its transformer layers, generating an embedding that captures its semantic meaning. The output is already optimised for tasks like similarity search.

Setting up the Database

After generating embeddings for each relic sets using the all-MiniLM-L6-v2 model, the next step is to store them in the PostgreSQL database with the pgvector extension.

For developers using AWS, there is a good news. In May 2023, AWS announced that Amazon Relational Database Service (RDS) for PostgreSQL would be supporting pgvector. In November 2024, Amazon RDS started to support pgvector 0.8.0.

pgvector is now supported on Amazon RDS for PostgreSQL.

To install the extension, we will run the following command in our database. This will introduce a new datatype called VECTOR.

CREATE EXTENSION vector;

After this, we can define our table as follows.

CREATE TABLE IF NOT EXISTS embeddings (
id TEXT PRIMARY KEY,
vector VECTOR(384),
text TEXT
);

Besides the id column which is for the unique identifier, there are two other columns that are important.

The text column stores the original text for each relic (the two-piece and four-piece bonus descriptions).

The vector column stores the embeddings. The VECTOR(384) type is used to store embeddings, and 384 here refers to the number of dimensions in the vector. In our case, the embeddings generated by the all-MiniLM-L6-v2 model are 384-dimensional, meaning each embedding will have 384 numbers.

Here, a dimension refers to one of the “features” that helps describe something. When we talk about vectors and embeddings, each dimension is just one of the many characteristics used to represent a piece of text. These features could be things like the type of words used, their relationships, and even the overall meaning of the text.

Updating the Database

After the table is created, we can proceed to create INSERT INTO SQL statements to insert the embeddings and their associated text into the database.

In this step, I load the relic information from a JSON file and process it.

import json

# Load your relic set data from a JSON file
with open('/content/hsr-relics.json', 'r') as f:
relic_data = json.load(f)

# Prepare data
relic_info_data = [
{"id": relic['name'], "text": relic['two_piece'] + " " + relic['four_piece']} # Combine descriptions
for relic in relic_data
]

The relic_info_data will then be passed to the following function to generate the INSERT INTO statements.

# Function to generate INSERT INTO statements with vectors
def generate_insert_statements(data):
# Initialise list to store SQL statements
insert_statements = []

for record in data:
# Extracting text and id from the record
id = record.get('id')
text = record.get('text')

# Generate the embedding for the text
embedding = generate_embedding(text)

# Convert the embedding to a list
embedding_list = embedding.tolist()

# Create the SQL INSERT INTO statement
sql_statement = f"""
INSERT INTO embeddings (id, vector, text)
VALUES (
'{id.replace("'", "''")}',
ARRAY{embedding_list},
'{text.replace("'", "''")}')
ON CONFLICT (id) DO UPDATE
SET vector = EXCLUDED.vector, text = EXCLUDED.text;
"""

# Append the statement to the list
insert_statements.append(sql_statement)

return insert_statements
The embeddings of the relic sets are successfully inserted to the database.

How It All Fits Together: Query the Database

Once we have stored the vector embeddings of all the relic sets in our PostgreSQL database, the next step is to find the relic sets that are most similar to a given character’s relic needs.

Just like what we have done for storing relic set embeddings, we need to generate an embedding for the query describing the character’s relic needs. This is done by passing the query through the model as demonstrated in the following code.

def query_similar_embeddings(query_text):
query_embedding = generate_embedding(query_text)

return query_embedding.tolist()

The generated embedding is an array of 384 numbers. We simply use this array in our SQL query below.

SELECT id, text, vector <=> '[<embedding here>]' AS distance
FROM embeddings
ORDER BY distance
LIMIT 3;

The key part of the query is the <=> operator. This operator calculates the “distance” between two vectors based on cosine similarity. In our case, it measures how similar the query embedding is to each stored embedding. The smaller the distance, the more similar the embeddings are.

We use LIMIT 3 to get the top 3 most similar relic sets.

Test Case: Finding Relic Sets for Gallagher

Gallagher is a Fire and Abundance character in HSR. He is a sustain unit that can heal allies by inflicting a debuff on the enemy.

According to the official announcement, Gallagher is a healer. (Image Source: Honkai: Star Rail YouTube)

The following screenshot shows the top 3 relic sets which are closely related to a HSR character called Gallagher using the query “Suggest the best relic sets for this character: Gallagher is a Fire and Abundance character in Honkai: Star Rail. He can heal allies.”

The returned top 3 relic sets are indeed recommended for Gallagher.

One of the returned relic sets is called the “Thief of Shooting Meteor”. It is the official recommended relic set in-game, as shown in the screenshot below.

Gallagher’s official recommended relic set.

Future Work

In our project, we will not be implementing indexing because currently in HSR, there are only a small number of relic sets. Without an index, PostgreSQL will still perform vector similarity searches efficiently because the dataset is small enough that searching through it directly will not take much time. For small-scale apps like ours, querying the vector data directly is both simple and fast.

However, when our dataset grows larger in the future, it is a good idea to explore indexing options, such as the ivfflat index, to speed up similarity searches.

References

[KOSD] Solving SQL File Encoding Issues on Git with PowerShell

Few days ago, some of our teammates discovered that the SQL files they tried to pull from our GitHub repo had encoding issue. When they did git pull, there would be an error saying “fatal: failed to encode ‘…/xxxxx.sql’ from UTF-16-LE-BOM to UTF-8”.

In addition, on GitHub, the SQL files we committed to the GitHub are all marked as binary files. Thus we couldn’t view the changes we made to those files in the commit.

Cause of the Issue

It turns out that those SQL files are generated from SQL Server Management Studio (SSMS).

Default file encoding of SSMS is Western European (Windows) – Codepage 1252.

By default, the encoding used to save SQL files in SSMS is UTF-16. For my case, my default encoding is the “Western European (Windows) – Codepage 1252”. Codepage 1252 is a single-byte character encoding of the Latin alphabet that was used in Windows for English and many Romance and Germanic languages. This encoding will cause Git to treat the files as binary files.

Solution

The way to resolve this issue is to force the file to use UTF-8 encoding. We can run the following PowerShell script to change the encoding of all SQL files in a given directory and its subdirectories.

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False

Get-ChildItem "<absolute directory path>" -Recurse *.sql | foreach {
    $FilePath = $_.FullName
    $FileContent = Get-Content $FilePath
    [System.IO.File]::WriteAllLines($FilePath, $FileContent, $Utf8NoBomEncoding)
}

The BOM (Byte Order Mark), a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF), is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary. This explains why we pass $False to the constructor of UTF8Encoding to indicate that BOM is not needed.

Wrap-Up

That’s all for a short little PowerShell script we used to solve the encoding issue of our SQL files.

There is an interesting discussion on StackOverflow about this issue, please give it a read too.

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

[KOSD] Let’s Talk about CASE

Last week, a developer in our team encountered an interesting question in his SQL script on SQL Server 2019. For the convenience of discussion, I’ve simplified his script as follow.

DECLARE @NUM AS TINYINT = 0
DECLARE @VAL AS VARCHAR(MAX) = '20.50'

SELECT CASE @NUM WHEN 0 THEN CAST(@VAL AS DECIMAL(10, 2))
                 WHEN 1 THEN CAST(@VAL AS DECIMAL(10, 4))
                 ELSE -1 
       END AS Result

The result he expected was 20.50 because @NUM equals to 0, so by right the first result expression should be executed. However, the truth is that it returned 20.5000 as if the second result expression which is casting @VAL into a decimal value with a scale of 4 was run.

So, what is the cause of this issue here?

SQL Data Types Implicit Conversion

First of all, according to the Microsoft Learn documentation, the data types of all result expressions must be the same or must be an implicit conversion.

In the script above, we have two data types in the result expressions, i.e. DECIMAL and INT (-1 in the ELSE result expression). Hence, we need to understand the implicit data type conversions that are allowed for SQL Server system-supplied data types, as shown below. The table below shows that INT can be implicit converted to DECIMAL and vice versa.

All data type conversions allowed for SQL Server system-supplied data types (Image Source: Microsoft Learn)

Data Precendence

While the above chart illustrates all the possible explicit and implicit conversions, we still do not know the resulting data type of the conversion. For our case above, the resulting data type depends on the rules of data type precedence.

According to the data type precedence in SQL Server, we have the following precedence order for data types.

  1. user-defined data types (highest)
  2. sql_variant
  3. xml
  4. datetimeoffset
  5. datetime2
  6. datetime
  7. smalldatetime
  8. date
  9. time
  10. float
  11. real
  12. decimal
  13. money
  14. smallmoney
  15. bigint
  16. int
  17. smallint
  18. tinyint
  19. bit
  20. ntext
  21. text
  22. image
  23. timestamp
  24. uniqueidentifier
  25. nvarchar (including nvarchar(max) )
  26. nchar
  27. varchar (including varchar(max) )
  28. char
  29. varbinary (including varbinary(max) )
  30. binary (lowest)

Since DECIMAL has a higher precedence than INT, hence we are sure that the script above will result in a DECIMAL output with the highest scale, i.e. DECIMAL(10, 4). This explains why the result of his script is 20.5000.

Conclusion

Now, if we change the script above to be something as follows, we should receive an error saying “Error converting data type varchar to numeric”.

DECLARE @NUM AS TINYINT = 0
DECLARE @VAL AS VARCHAR(MAX) = '20.50'

SELECT CASE @NUM WHEN 0 THEN 'A'
                 WHEN 1 THEN CAST(@VAL AS DECIMAL(10, 4))
                 ELSE -1 
       END AS Result

Yup, that’s all about our discussion about the little bug he found in his script. Hope you find it useful. =)

References

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

Unit Test Stored Procedures and Automate Build, Deploy, Test Azure SQL Database Changes with CI/CD Pipelines

I was recently asked about how to unit test stored procedures before deploying to servers. Unfortunately, there are not much discussions about unit testing stored procedures, especially with tools like SSDT and Azure DevOps. Hence, I decide to write this walkthrough to share my approach to this issue.

PROJECT GITHUB REPOSITORY

The complete source code of this project can be found at https://github.com/goh-chunlin/Lunar.Spending.

Unit Testing

According to The Art of Unit Testing, a unit test is an automated piece of code that invokes the unit of work being tested, and then checks some assumptions about a single end result of that unit.

Without unit testing, one has no choice but to rely on system and integration tests which are normally performed in the later stage of the SDLC. Some teams may even resort to the troublesome way, i.e. manually testing the end product they’re developing to invoke their codes.

Unit testing of stored procedures is also very crucial. If the bugs in stored procedures are not caught in the early stage of development, it is very challenging to rollback the data changes that have been made to the database.

Setup SSDT (SQL Server Data Tools)

SSDT is a development tool for building SQL Server relational databases, including databases in Azure SQL. The core SSDT functionality to create database projects has remained integral to Visual Studio 2022. Thus, we can easily include SSDT by selecting the “Data storage and processing” workload in the Visual Studio Installer, as shown below.

Setting up SSDT in Visual Studio 2022.

With SSDT, we can work directly with a connected database instance on/off-premise. We can use SSDT Transact-SQL design capabilities to build, debug, maintain, and refactor databases. In this article, we will be using SSDT to create unit tests that verify the behavior of several stored procedures.

Create a New Database Project

In this article, we will assume that we have an existing database hosted on Azure SQL server.

Firstly, as shown in the screenshot below, we need to create a new database project in order to import database schema and stored procedures from the database on Azure.

Creating a new SQL Server Database project on VS 2022.

Let’s name our project DbCore. We will then see a simple DbCore project shown in our Solution Explorer, as demonstrated in the screenshot below.

The database project is successfully created.

Next, we will import our Azure database into the database project by right-clicking on it in the Solution Explorer.

Select the “Database…” option to import from existing Azure database.

The widget will then import the data from the Azure database based on the given connection string, as shown below.

Importing database to our database project.

Once the import is done, we shall see our tables and stored procedures listed under the dbo directory in Solution Explorer, as shown below.

Table and stored procedures are successfully imported!

Before we continue, we need to edit the Target Platform of our database project accordingly, as shown in the following screenshot, otherwise we will not be able to publish the database later.

Changing the target platform to avoid the publish error.

Create Unit Test for Stored Procedure

Let’s say we would like to unit test the AddSpending stored procedure. What we need to do is simply right-clicking the AddSpending stored procedure and then click on the “Create Unit Tests…” option, as demonstrated below.

Adding unit test for a selected stored procedure.

We will then be asked for the connection string of the database that the unit test project will be connecting to. Once the project has been successfully created, we will be given a template unit test as follows.

A boilerplate code of stored procedure unit test.

We can include pre and post test SQL statements which will be run before and after the test script is executed, respectively.

For example, we would like to have a clean Spendings table before the unit test runs, we can have the following SQL script to delete all rows in the table.

Pre-test script will be run before the test script.

In our test script, we will test to see if the description and amount can be stored correctly in the database. Hence, we need to specify two Test Conditions to verify the two columns, as shown in the following screenshot.

RC means Return Code. It can later be used in a test case assertion.

Now, we can run our very first unit test with the Test Explorer to see if our stored procedure has any issue or not.

The test case fails. We shall check why the number is rounded up.

It turns out that this bug is caused by wrong data type used for the Amount column. Now we can proceed to fix it.

The test passed after we fixed the issue in our table schema and stored procedure.

Getting Ready for Publishing Database

As we discussed earlier, unit testing is not only about writing a piece of code to test our unit of work, but also making it to be automatically testable.

Hence, our next move is to automate the build, test, and release of our database.

Firstly, we shall make sure the source code of our projects is on GitHub (or any source control supported in Azure DevOps).

Secondly, we need to create the Publish Profile of our database. To do so, we simply right click on database project and choose the “Publish…” option. There will be a window popped out, as shown below.

Configuring the Publish Profile of our database.

As shown in the screenshot above, there are many settings that can be configured, including Azure SQL related settings. After configuring them accordingly, please click on the “Create Profile” button. Once it is grayed out, it means that the profile has been generated successfully. We can then proceed to close the popped-out window.

Please make sure the generated Database Profile file is included in the source control. Kindly add it to Git if it is not, as shown in the screenshot below. This is because this file is needed in our Azure DevOps build pipeline later.

Please make sure our Database Profile is included in source control.

Finally, let’s create a project on Azure DevOps which will host the build and release pipelines for our automatic database deployment.

We will configure our project to have only the Pipelines service on Azure DevOps.

Setup Build Pipeline

Once we have created our project on Azure DevOps, we can proceed to create our Build Pipeline.

Firstly, we need to specify the code repository we are using. Since our code sists on GitHub, we will connect Azure DevOps with our project on GitHub as shown in the screenshot below.

Please remember to select the correct branch too.

Next, we will start off from the .NET Desktop template. The reason why we choose this template is because it contains many tasks that we can use in our database build pipeline.

We will make use of the .NET Desktop template for our database build pipeline.

Here we will be using the Microsoft-hosted agent in our build pipeline. With Microsoft-hosted agents, maintenance and upgrades are taken care of for us. Each time we run a pipeline, we will get a fresh virtual machine for each job in the pipeline.

We need to make sure that “windows-2022” (Windows Sever 2022 with VS2022 installed), which is the latest version as of now, is chosen in the Agent Specification field. I have tried with the default “windows-2019” option before and there would be an error message “Error CS0234: The type or namespace name ‘Schema’ does not exist in the namespace ‘Microsoft.Data.Tools’ (are you missing an assembly reference?)”.

Please update to use windows-2022 as our build agent or else there would be issues later.

Next, we need to update the version of NuGet to be the latest, which is now 6.1.0.

After that, we move on to the third task in the pipeline which is building our Solution in Release mode with the following MSBuild Arguments on Any CPU.

/p:DeployOnBuild=true /p:WebPublishMethod=Package /p:PackageAsSingleFile=true /p:SkipInvalidConfigurations=true /p:PackageLocation="$(build.artifactstagingdirectory)\\"
Configuring the Build task.

We will remove the VsTest from the Build Pipeline because it will be proper to run the unit tests after changes have been deployed to the database. Otherwise, we will still be testing against the old schema and old stored procedures. Hence, we will add the testing task in the Release pipeline instead.

Now, since the testing will be done in the Release Pipeline instead, we shall create a task to copy the assemblies of TestDbCore project to the Build Artifact so that the assemblies can be used in the Release Pipeline later. Thus, we will add a new task “Copy Files” as shown in the screenshot below.

TestDbCore assemblies and other relevant files will be copied to DbCoreTest folder in the Build Artifact.

We will also remove the next task, which is publishing symbols because it is not necessary.

After that, we will add a new task to copy DACPAC file, which is generated during Build, to the Build Artifact, as shown in the following screenshot.

DACPAC file is needed to deploy our database to an existing instance of Azure SQL database.

Another file we need to copy is our Database Profile. This is the reason why earlier we have to make sure the profile file needs to be in the source control.

The database publish profile needs to be copied to the Build Artifact directory as well.

There is nothing to change for the Publish Artifact task. So, we can now move on to enable the Continuous Integration for the Build Pipeline, as shown in the screenshot below.

We can enable the continuous CI under the Triggers section of our Build Pipeline.

Finally, we can save and queue our Build Pipeline. If the build is successful, we will be able to see a Build Artifact produced, as shown in the screenshot below.

Build is successfully executed!

Setup Release Pipeline

In order to automatically deploy our database changes after the database project and unit test project are built successfully, we need to configure the Release Pipeline.

First of all, we need to integrate our Build Pipeline with this new Release Pipeline, as demonstrated in the following screenshot.

Adding the Build Artifact generated earlier in the Build Pipeline.

Next, we can enable the Continuous Deployment, as shown in the screenshot below. This is to make sure that we can deploy the database changes and test them automatically right after the build is completed successfully.

Enabling CD in our Release Pipeline.

Now, we can move on to configure the tasks in the stage. Here, I have renamed the stage as “Deploy DbCore”. To be consistent with the Build Pipeline, here we will be using “windows-2022” as the agent in our Release Pipeline too.

Please update to use windows-2022 as our agent or else there would be issues later.

The first task will be deploying database changes to Azure SQL with the task “Azure SQL Database deployment”. In the task, we need to select our Azure subscription and provide our Azure SQL database admin login credential so that the database changes can be deployed to Azure SQL on our behalf.

In the same task, under the Deployment Package section, we need to state that we will be deploy with a DACPAC file. This is also where we will use the DACPAC file and publish profile file in the Build Artifact.

Setting up the deployment package.

Next, we will run into a problem. We are supposed to add the testing task next. However, connection strings to the database are needed. So how could we securely store the connection strings in our pipeline?

Shanmugam Chinnappa proposed three ways to solve this problem. I will demostrate how I use his method of using user-defined secret variables. This is the most straightforward way among the three.

Firstly, we need to edit the app.config file in the TestDbCore project. In this file, two connection strings can be found. The connection string in ExecutionContext is used to execute the test script in our unit test. The PrivilegedContext connection is used to test interactions with the database outside the test script in our unit test.

To keep things simple, we will use the same connection string for both contexts. We thus can replace the connection string in those two contexts with a token #{TestDbCoreConnection}#.

<ExecutionContext Provider="System.Data.SqlClient" 
    ConnectionString="#{TestDbCoreConnection}#" CommandTimeout="30" />

<PrivilegedContext Provider="System.Data.SqlClient" 
    ConnectionString="#{TestDbCoreConnection}#" CommandTimeout="30" />

After committing this change to our GitHub repo, we will specify the actual connection string in the Pipeline Variables section under the same name as the token. Since we use TestDbCoreConnection as our token label, the variable is thus called TestDbCoreConnection as well, as shown in the screenshot below.

Storing the actual database connection string in the Pipeline Variables of our Release Pipeline.

Now we will need a task which would replace the tokens in the app.config file with the actual value. The task we will be using here is the “Replace Tokens” done by Guillaume Rouchon.

Previously we have moved the TestDbCore bin/Release folder to the Build Artifact. In fact, app.config, which is renamed to TestDbCore.dll.config, is in the folder as well. Hence, we can locate the config file easily by pointing the task to the Build Artifact accordingly, as demonstrated in the screenshot below.

We will only need to specify the Root Directory of where our TestDbCore.dll.config is located.

Please take note that since our unit tests need to test the Azure SQL database specified in the connection string, we need to allow Azure services and resources to access our Azure SQL server by configuring its firewall, as shown in the following screenshot. Otherwise, all our unit tests will fail because the Azure SQL server cannot be reached.

Interestingly, the Azure AQL Database deployment task will still be executed successfully even though we do not allow the access mentioned above.

We need to allow Azure services and resources to access the relevant Azure SQL server.

With the actual connection string in place, we can now add our Visual Studio Test (2.*) task back to execute our unit test.

We have our test files in the folder DbCoreTest in the Build Artifact as we designed earlier in the Build Pipeline. Hence, we simply need to point the Search Folder of the Visual Studio Test task to the folder accordingly.

Setting up Visual Studio Test to run our unit test for our stored procedures.

You may have noticed that the test results will be stored in a folder called $(Agent.TempDirectory)\TestResults. So, let’s add our last task of the stage which is to publish the test results.

We will need to specify that our test results are generated by VSTest. Aftervthat, we point the task to look for test results in the $(Agent.TempDirectory)\TestResults folder. Finally, we name our test run.

We can publish our test results to the Release Pipeline.

That’s all! Please remember to save the Release Pipeline changes.

Now when there is a new build completed, the Release Pipeline will be automatically triggered. Once it is completed, not only our database on Azure SQL will be updated accordingly, but also we will have a detailed test result. For example, when all of our unit tests have passed, we will get a test result as shown below.

If all tests have passed, we will receive a trophy. 🙂

However, if there is one or many tests fail, we can easily locate the failed tests easily in the report.

This shows that our stored procedure dbo_AddSpendingTest has issues which need our attention.

That’s all for a simple walkthrough from writing unit tests for stored procedures to automatically deploying and testing them with Azure DevOps CI/CD pipelines.

I actually started learning all these after watching Hamish Watson’s sharing on DevOps Lab show which was released four years ago in 2018. In the video, he also shared about how DBA could do unit testing of their database changes tSQLt. Please watch the full YouTube video if you would like to find out more about unit testing our stored procedures.

Damian meets with Hamish Watson at the MVP Summit to talk about testing our database changes.

References