[KOSD] Solving SQL File Encoding Issues on Git with PowerShell

Few days ago, some of our teammates discovered that the SQL files they tried to pull from our GitHub repo had encoding issue. When they did git pull, there would be an error saying “fatal: failed to encode ‘…/xxxxx.sql’ from UTF-16-LE-BOM to UTF-8”.

In addition, on GitHub, the SQL files we committed to the GitHub are all marked as binary files. Thus we couldn’t view the changes we made to those files in the commit.

Cause of the Issue

It turns out that those SQL files are generated from SQL Server Management Studio (SSMS).

Default file encoding of SSMS is Western European (Windows) – Codepage 1252.

By default, the encoding used to save SQL files in SSMS is UTF-16. For my case, my default encoding is the “Western European (Windows) – Codepage 1252”. Codepage 1252 is a single-byte character encoding of the Latin alphabet that was used in Windows for English and many Romance and Germanic languages. This encoding will cause Git to treat the files as binary files.

Solution

The way to resolve this issue is to force the file to use UTF-8 encoding. We can run the following PowerShell script to change the encoding of all SQL files in a given directory and its subdirectories.

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False

Get-ChildItem "<absolute directory path>" -Recurse *.sql | foreach {
    $FilePath = $_.FullName
    $FileContent = Get-Content $FilePath
    [System.IO.File]::WriteAllLines($FilePath, $FileContent, $Utf8NoBomEncoding)
}

The BOM (Byte Order Mark), a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF), is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary. This explains why we pass $False to the constructor of UTF8Encoding to indicate that BOM is not needed.

Wrap-Up

That’s all for a short little PowerShell script we used to solve the encoding issue of our SQL files.

There is an interesting discussion on StackOverflow about this issue, please give it a read too.

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

[KOSD Series] Discussion about Cosmos DB Performance

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

kosd-cosmos-db.png

During a late dinner with my friend on 12 January last month, he commented that he encountered a very serious performance problem in retrieving data from Cosmos DB (pka DocumentDB). It’s quite strange because, in our IoT project which also stores millions of data in Cosmos DB, we never had this problem.

Two weeks later, on 27 January, he happily showed me his improved version of the code which could query the data in about one to two seconds.

Yesterday, after having a discussion, we further improved the code. Hence, I’d like to write down this learning experience here.

try-cosmos-db.png
Try Azure Cosmos DB today!

Preparation

Due to the fact that we couldn’t demonstrate using the real project code, I thus created a sample project getting data from database and collection on my personal Azure Cosmos DB account. The database contains one collection which has 23,967 records of Student data.

The Student class and the BaseEntity class that it inherits from are as follows.

public class Student : BaseEntity
{
    public string Name { get; set; }

    public int Age { get; set; }

    public string Description { get; set; }
}
public abstract class BaseEntity
{
    [JsonProperty(PropertyName = "id")]
    public string Id { get; set; }

    public string Type { get; set; }

    public DateTime CreatedAt { get; set; } = DateTime.Now;
}

You may wonder why I have Type defined.

Type and Cost Saving

The reason of having Type is that, before DocumentDB was rebranded as Cosmos DB in May 2017, the DocumentDB pricing is based on collections. Hence, the more collection we have in the database, the more we need to pay.

confused-about-documentdb-pricing.png
DocumentDB was billed per collection in the past. (Source: Stack Overflow)

To overcome that, we squeeze the different types of entities in the same collection. So, in the example above, let’s say we have three classes — Students, Classroom, Teacher that inherit from BaseEntity, then we will put the data of the three classes in the same collection.

Then here comes a problem: How do we know which document in the collection is Student, Classroom or Teacher? There is where the property Type will help us. So in our example above, the possible value for Type will be Student, Classroom, and Teacher.

Hence, when we add a new document through repository design pattern, we have the following method.

public async Task<T> AddAsync(T entity)
{
    ...

    entity.Type = typeof(T).Name;

    var resourceResponse = await _documentDbClient.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId), entity);

    return resourceResponse.StatusCode == HttpStatusCode.Created ? (dynamic)resourceResponse.Resource : null;
}

Original Version of Query

We used the following code to retrieve data of a class from the collection.

public async Task<IEnumerable<T>> GetAllAsync(Expression<Func<T, bool>> predicate = null)
{
    var query = _documentDbClient.CreateDocumentQuery<T>(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId));

    var documentQuery = (predicate != null) ?
        (query.Where(predicate)).AsDocumentQuery():
        query.AsDocumentQuery();

    var results = new List<T>();
    while (documentQuery.HasMoreResults)
    {
        results.AddRange(await documentQuery.ExecuteNextAsync<T>());
    }

    return results.Where(x => x.Type == typeof(T).Name).ToList();
}

This query will run very slow because the line where it filters the class is after querying data from the collection. Hence, in the documentQuery, it may already contain data of three classes (Student, Classroom, and Teacher).

Improved Version of Query

So one obvious way is to move the line of filtering by Type above. The improved version of code now looks as such.

public async Task<IEnumerable<T>> GetAllAsync(Expression<Func<T, bool>> predicate = null)
{
    var query = _documentDbClient
        .CreateDocumentQuery<T>(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId))
        .Where(x => x.Type == typeof(T).Name);

    var documentQuery = (predicate != null) ?
        (query.Where(predicate)).AsDocumentQuery():
        query.AsDocumentQuery();

    var results = new List<T>();
    while (documentQuery.HasMoreResults)
    {
        results.AddRange(await documentQuery.ExecuteNextAsync<T>());
    }

    return results;
}

By doing so, we managed to reduce the query time significantly because all the actual filtering will be done at Cosmos DB side. For example, there was one query I managed to reduce the query time of it from 1.38 minutes to 3.42 seconds using the 23,967 records of Student data.

Multiple Predicates

The code above however has a disadvantage. It cannot accept multiple predicates.

I thus changed it to be as follows so that it returns IQueryable.

public IQueryable<T> GetAll()
{
    return _documentDbClient
        .CreateDocumentQuery<T>(UriFactory.CreateDocumentCollectionUri(_databaseId, _collectionId))
        .Where(x => x.Type == typeof(T).Name);
}

This has another inconvenience is there whenever I call GetAll, I need to remember to load the data with HasMoreResults as shown in the code below.

var studentDocuments = _repoDocumentDb.GetAll()
    .Where(s => s.Age == 8)
    .Where(s => s.Name.Contains("Ahmad"))
    .AsDocumentQuery();

var results = new List<T>();
while (studentDocuments.HasMoreResults)
{
    results.AddRange(await studentDocuments.ExecuteNextAsync<T>());
}

Conclusion

This is just an after-dinner discussion about Cosmos DB between my friend and me. If you have any better idea of designing repository for Cosmos DB (pka DocumentDB), please let us know. =)

[KOSD Series] IP Addresses of Our Azure App Services that need to be Whitelisted by Our API Providers

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

kosd-azure_web_app-powershell

It is a common scenario for developers to integrate with different parties by using their APIs. Most of the time, the APIs are located in a locked-down network environment where only whitelisted IP addresses are allowed to access their APIs. We will then be asked to give the API providers the IP addresses of our servers.

If it’s our web back-end calling the APIs and we host our web applications on Microsoft Azure App Services, then how could we get the IP addresses?

As mentioned in a discussion about inbound IP address by Benjamin Perkins, the Escalation Engineer on the Azure team, there are about 4 outgoing IP addresses for an Azure Web Apps normally. To retrieve the outbound IP addresses of an Azure web app, we simply need to get it from the Properties of the web app on Azure Portal.

outbound-ip-addresses-in-azure-app-service.png
Locate the outbound IP addresses here.

We can also get the same result if we use the Azure Resource Explorer which is still in preview now.  Benjamin covered this in a video clip on his article too.

For PowerShell lovers, as pointed out by Adrian Calinescu, one of the commenters on Benjamin’s article, we can use PowerShell to find out the outbound IP addresses too. With the new Azure Cloud Shell, we can simply use the following command to retrieve directly the outbound IP addresses of an Azure web app on Azure Portal directly.

Get-AzureRmResource -ResourceGroupName  -ResourceType Microsoft.Web/sites -ResourceName  | select -expand Properties | Select-Object outboundIpAddresses

outbound-ip-addresses-in-azure-app-service-using-powershell.png
Managing Azure resources using shell directly on a browser.

For those who would like to have your own set of outbound IP addresses, please check out ASE (App Service Environment) which grants users control over inbound and outbound application network traffic.

Finally, we can also whitelist all the IP addresses of the Azure datacentres, which can be downloaded here.

azure-datacenter-ip-range-download.png
List of Microsoft Azure Datacentre IP addresses are available on Microsoft website.

References

[KOSD Series] First Attempt of Deploying ASP .NET Core to Azure Container Service

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

kosd-docker-azure_container_registry-vsts

Last month, after sharing the concepts and use cases of Domain Driven Development, Riza moved on to talk about Containers in the sharing session of Singapore .NET Developers Community.

microservices-not-equal-to-containers.png
Riza’s talking about Containers. Yes, microservices are not containers!

Learning Motivation

In the beginning of Riza’s talk, he mentioned GO-JEK, an Indonesia ride-hailing phone service. Due to their rapid growth, the traditional monolithic architecture can no longer support their business. Hence, they switched to use a modern approach which includes moving apps to containers.

go-jek-containers.png
Go-Jek team is working on moving apps to container.

Hence, after the meetup, I was very excited to find out more about micro-services and Docker containers. With the ability of .NET Core to be cross-platform, as a Azure lover, I am interested to find out more how I can deploy ASP .NET Core web app to a container in Azure. So, I decided to write this short article to share with my teammates about this that they can learn while drinking a cup of coffee.

Creating New Project with Docker Support

Since I am trying it out as personal project, I choose to start it with a new ASP .NET Core project. Then in the Visual Studio, I can easily turn it to be a Docker supporting app easily by checking the “Enable Docker Support” option.

enable-docker-support.png
Enable Docker Support

For existing web application projects, we will not have the screen above. Luckily, it is still easy to add Docker Support to an existing ASP .NET Core project on Visual Studio.

add-docker-support-to-existing-project
Enabling Docker Support in existing projects.

Then by clicking on the “F5” button to run the project, I manage to get the following screen (The background is customized by me). The message is displayed using the following line.

System.Runtime.InteropServices.RuntimeInformation.OSDescription;

launched-at-localhost.png
Yay, we managed to run the web app inside a Linux container locally.

Publishing to Microsoft Azure with Continuous Delivery

Without Continuous Delivery, we also can easily right-click the web application to publish it to the Container Registry on Azure.

publishing-to-container-registry
Creating a new Azure Container Registry which will have the Docker image published to.

Then, on Azure Portal, we will see three new resources added. Firstly, we will have the Container Registry.

Then, we will also have an app service site which is running the image downloaded from the Container Registry. Finally, we have an App Service Plan which needs to be at least B1 because free and shared SKUs are not available for apps running on Linux (The official Microsoft documentation says we should have the VM size of the App Service Plan to be S1 or larger though).

container-registry-on-azure.png
Container Registry for my new web app, Changshi.

To enable Continuous Delivery, I choose to use Github + Visual Studio Team Services (VSTS). By doing so, build and release will be automatically started whenever I check in code to Github.

build-on-vsts
Build history and details on VSTS.

Yup, this is so far what I have tried out in my first step of playing with containers. If you are interested, please check out the references listed below.

References