[KOSD] Solving SQL File Encoding Issues on Git with PowerShell

Few days ago, some of our teammates discovered that the SQL files they tried to pull from our GitHub repo had encoding issue. When they did git pull, there would be an error saying “fatal: failed to encode ‘…/xxxxx.sql’ from UTF-16-LE-BOM to UTF-8”.

In addition, on GitHub, the SQL files we committed to the GitHub are all marked as binary files. Thus we couldn’t view the changes we made to those files in the commit.

Cause of the Issue

It turns out that those SQL files are generated from SQL Server Management Studio (SSMS).

Default file encoding of SSMS is Western European (Windows) – Codepage 1252.

By default, the encoding used to save SQL files in SSMS is UTF-16. For my case, my default encoding is the “Western European (Windows) – Codepage 1252”. Codepage 1252 is a single-byte character encoding of the Latin alphabet that was used in Windows for English and many Romance and Germanic languages. This encoding will cause Git to treat the files as binary files.

Solution

The way to resolve this issue is to force the file to use UTF-8 encoding. We can run the following PowerShell script to change the encoding of all SQL files in a given directory and its subdirectories.

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False

Get-ChildItem "<absolute directory path>" -Recurse *.sql | foreach {
    $FilePath = $_.FullName
    $FileContent = Get-Content $FilePath
    [System.IO.File]::WriteAllLines($FilePath, $FileContent, $Utf8NoBomEncoding)
}

The BOM (Byte Order Mark), a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF), is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary. This explains why we pass $False to the constructor of UTF8Encoding to indicate that BOM is not needed.

Wrap-Up

That’s all for a short little PowerShell script we used to solve the encoding issue of our SQL files.

There is an interesting discussion on StackOverflow about this issue, please give it a read too.

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

Improve Life with Codes

In the realm of software development and business practices, not automating processes when it could bring significant benefits will normally be considered a missed opportunity or an inefficient use of resources. It could lead to wasted time, increased chances of errors, and reduced productivity.

Background Story

My teammate encountered this strange issue that a third-party core component in the system which run as a Windows service would stop randomly. The service is listening to a certain TCP port. When the service was down, telnet to that port would show that the connection was not successful.

After weeks of intensive log investigation, my teammate still could not figure out the reason why it would stop working. However, a glimmer of insight emerged: restarting the Windows service would consistently bring the component back online.

Hence, he solution is creating an alert system which would trigger email to him and the team to restart the Windows service when it goes down. The alert system is basically a scheduler checking the health of the TCP port which the service is listening to.

Since my teammate was only the few ones who could login to the server, he had to standby during weekends too to restart the Windows service. Not long after that, he submitted his resignation and left the company. Other teammates thus had to take over this manual restarting Windows service task.

Auto Restart Windows Service with C#

In order to avoid teammates getting burnout from manually restarting Window service frequently even at nights and during weekends, I decided to develop a C# programme which will be executed every 10 minutes at the server. The C# programme will make a connection to the port being listened by the Windows service to check whether the service is running or not. If it is not, the programme will restart it.

The code is as follows.

try
{
    using (TcpClient tcpClient = new())
    {
        tcpClient.Connect(serverIpAddress, port);
    }

    Console.WriteLine("No issue...");
}
catch (Exception)
{
    int timeoutMilliseconds = 120000;

    ServiceController service = new(targetService);

    try
    {
        Console.WriteLine("Restarting...");
        int millisec1 = Environment.TickCount;

        TimeSpan timeout = TimeSpan.FromMilliseconds(timeoutMilliseconds);

        if (service.Status != ServiceControllerStatus.Stopped) 
        {
            Console.WriteLine("Stopping...");
            service.Stop();
            service.WaitForStatus(ServiceControllerStatus.Stopped, timeout);
        }

        Console.WriteLine("Stopped!");
        int millisec2 = Environment.TickCount;
        timeout = TimeSpan.FromMilliseconds(timeoutMilliseconds - (millisec2 - millisec1));

        Console.WriteLine("Starting...");
        service.Start();
        service.WaitForStatus(ServiceControllerStatus.Running, timeout);

        Console.WriteLine("Restarted!");
    }
    catch (Exception ex) 
    {
        Console.WriteLine(ex.Message);
    }
}

In the programme above, we implement a timeout of 2 minutes. So after waiting the Windows service to stop, we will use the remaining time to wait for the service to be back to the Running status within the remaining time.

After the team had launched this programme as a scheduler, no one has to wake up at midnight just to login to server to restart the Windows service anymore.

Converting Comma-Delimted CSV to Tab-Delimted CSV

Soon, we realised another issue. The input files sent to the Windows service to process has invalid file content. The service is expecting tab-delimited CSV files but the actual content is comma-delimited. The problem has been there since last year, so there are hundreds of files not being processed.

In order to save his time, I wrote a Powershell script to do the conversion.

Get-ChildItem "<directory contains the files>" -Filter *.csv | 
Foreach-Object {
    Import-Csv -Path $_.FullName -Header 1,2,3,4,5,6,7,8,9 | Export-Csv -Path ('<output directory>' + $_.BaseName + '_out.tmp') -Delimiter `t -NoTypeInformation 

    Get-Content ('<output directory>' + $_.BaseName + '_out.tmp') | % {$_ -replace '"', ''} | Select-Object -Skip 1 | out-file -FilePath ('<output directory>' + $_.BaseName + '.csv')

    Remove-Item ('<output directory>' + $_.BaseName + '_out.tmp')
}

The CSV files do not have the header row and they all have 9 columns. Hence, that is the reason why I use “-Header 1,2,3,4,5,6,7,8,9” to add a temporary header. Otherwise, the script will treat the first line in the file to be header. This means that if the first line has multiple columns having the same value, the Import-Csv will fail. This is the reason why we need to add a temporary header with unique column values.

When using Export-Csv, all fields in the CSV are enclosed in quotation marks. Hence, we need to remove the quotation marks and remove the temporary header before we generate a tab-delimited CSV file as the output.

With this my teammate easily transform all the files to the correct format in less than 5 minutes.

Searching File Content with PowerShell

A few days after that, I found out that another teammate was reading the log files manually to find out the lines containing a keyword “access”. I was shocked by what he was doing because there were hundreds of logs everyday and that would mean he needed to spend hours or even days on the task.

Hence, I wrote him another simple PowerShell just to do the job.

Get-ChildItem "<directory contains the files>" -Filter *.log | 
Foreach-Object {
    Get-Content $_.FullName | % { if($_ -match "access") {write-host $_}}
}

With this, my teammate finally could finish his task early.

Wrap-Up

Automating software development processes is a common practice in the industry because of the benefits it offers. It saves time, reduces errors, improves productivity, and allows the team to focus on more challenging and creative tasks.

From a broader perspective, not automating the process but doing it manually might not be a tragic event in the traditional sense, as it does not involve loss of life or extreme suffering. However, it could be seen as a missed chance for improvement and growth.

TCP Listener on Microsoft Azure for IoT Devices

cloud-service-worker-role-automation-runbook.png

After working on the beacon projects back half a year ago, I was given a new task which is building a dashboard for displaying data collected from IoT devices. The IoT devices basically are GPS tracker with a few other additional sensors such as temperature and shaking detection.

I’m new to IoT field, so I’m going to share in this article what I had learnt and challenges I faced in this project so that it would benefit to juniors who are going to do similar things.

Project Requirements

We plan to have the service to receive data from the IoT devices to be on Microsoft Azure. There will be thousands or even millions of the same devices deployed eventually, so choosing cloud platform to help us scaling up easily.

We also need to store the data in order to display it on dashboard and reports for business use cases.

Challenge 1: Azure IoT Hub and The Restriction of Device Firmware

In the documentation of the device protocol, there is a set of instructions as follows.

First when device connects to server, module sends its IMEI as login request. IMEI is sent the same way as encoding barcode. First comes short identifying number of bytes written and then goes IMEI as text (bytes).

After receiving IMEI, server should determine if it would accept data from this module. If yes server will reply to module 01 if not 00.

I am not sure who wrote the documentation but I am certain that his English is not that easy to comprehend in the first read.

Anyway, this is a good indication that Azure IoT Hub will be helpful because it provides secure and reliable C2D (Cloud-to-Device) and D2C communication with HTTP, AMQP, and MQTT support.

However, when I further read the device documentation, I realized that the device could only send TCP packets over in a protocol the device manufacturer defined. In addition, the device doesn’t allow us to update its firmware at this moment, making it to send data using protocols accepted by Azure IoT Hub is impossible.

There is a fierce discussion about this on Stack Overflow. Unfortunately, none of the respondents understood what the OP was trying to say.

So, I have to say bye-bye to Azure IoT Hub and move on to build TCP Listener myself on Azure.

Challenge 2: Hosting TCP Listener on Azure

There is a great code sample on how to build a TCP listener in C# to listen for connections from TCP network clients.

So, where could we put this code at?

Could we use Azure App Service, such as Functions or Web Apps? Unfortunately, no. This is because only 80/TCP and 443/TCP are exposed publicly and the only protocol that works is HTTP. In addition, App Service is all IIS, the web server provides the entire platform, there is no room for long running processes or threads that can sit and wait for communication on another port outside of IIS.

The only easy option we have now is to use Azure Cloud Service with Worker Role. Worker Role does not use IIS and it can run our app standalone.

creating-worker-role.png
Creating a new Cloud Service project with one Worker Role on Visual Studio 2017.

A default template of WorkerRole class will be provided.

public class WorkerRole : RoleEntryPoint
{
    private readonly CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
    private readonly ManualResetEvent runCompleteEvent = new ManualResetEvent(false);

    public override void Run()
    {
        Trace.TraceInformation("TrackerTcpListener is running");

        try
        {
            this.RunAsync(this.cancellationTokenSource.Token).Wait();
        }
        finally
        {
            this.runCompleteEvent.Set();
        }
    }

    public override bool OnStart()
    { 
        // Set the maximum number of concurrent connections
        ServicePointManager.DefaultConnectionLimit = 12;

        // For information on handling configuration changes
        // see the MSDN topic at https://go.microsoft.com/fwlink/?LinkId=166357.

        bool result = base.OnStart();

        Trace.TraceInformation("TrackerTcpListener has been started");

        return result;
    }

    public override void OnStop()
    {
        Trace.TraceInformation("TrackerTcpListener is stopping");

        this.cancellationTokenSource.Cancel();
        this.runCompleteEvent.WaitOne();

        base.OnStop();

        Trace.TraceInformation("TrackerTcpListener has stopped");
    }

    private async Task RunAsync(CancellationToken cancellationToken)
    {
        // TODO: Replace the following with your own logic.
        while (!cancellationToken.IsCancellationRequested)
        {
            Trace.TraceInformation("Working");
            await Task.Delay(1000);
        }
    }
}

It’s obvious that the first method we are going to work on is the RunAsync method with a “TODO” comment.

However, before that, we need to define an IP Endpoint for this TCP listener so that we can tell the IoT device to send the packets to the specified port on the IP address.

worker-role-endpoints.png
Configuring Endpoints of a Cloud Service.

With endpoints defined, we can then proceed to modify the code.

private async Task RunAsync(CancellationToken cancellationToken)
{
    try
    {
        TcpClient client;

        while (!cancellationToken.IsCancellationRequested)
        {
            var ipEndPoint = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["TcpListeningEndpoint1"].IPEndpoint;
            
            var listener = new System.Net.Sockets.TcpListener(ipEndPoint) { ExclusiveAddressUse = false };
            listener.Start();

            // Perform a blocking call to accept requests.
            client = listener.AcceptTcpClient();

            // Get a stream object for reading and writing
            NetworkStream stream = null;

            try
            {
                stream = client.GetStream();

                await ProcessInputNetworkStreamAsync(stream);
            }
            catch (Exception ex)
            {
                // Log the exception
            }
            finally
            {
                // Shutdown and end connection
                if (stream != null)
                {
                    stream.Close();
                }

                client.Close();

                listener.Stop();
            }
        }
    }
    catch (Exception ex)
    {
        // Log the exception
    }
}

The code for the method ProcessInputNetworkStreamAsync above is as follows.

private async Task ProcessInputNetworkStreamAsync(string imei, NetworkStream stream)
{
    Byte[] bytes = new Byte[5120];
    int i = 0;
    byte[] b = null;
    var receivedData = new List<string>();

    while ((i = stream.Read(bytes, 0, bytes.Length)) != 0)
    {
        receivedData = new List<string>();

        for (int reading = 0; reading < i; reading++)
        {
            using (MemoryStream ms = new MemoryStream())
            {
                ms.Write(bytes, reading, 1);
                b = ms.ToArray();
            }
            
            receivedData.Add(ConvertHexadecimalByteArrayToString(b));
        }

        Trace.TraceInformation("Received Data: " + string.Join(",", receivedData.ToArray()));

        // Respond from the server to device
        byte[] serverResponse = ConvertStringToHexadecimalByteArray("<some text to send back to the device>");
        stream.Write(serverResponse, 0, serverResponse.Length);
    }
}

You may wonder what I am doing above with ConvertHexadecimalByteArrayToString and ConvertStringToHexadecimalByteArray methods. They are needed because the packets used in the TCP protocol of the device is in hexadecimal. There is a very interesting discussion about how to do the conversion on Stack Overflow, so I won’t repeat it here.

Challenge 3: Multiple Devices

The code above is only handling one port. Unfortunately, the IoT device doesn’t send over the IMEI number or any other identification number of the device when the actual data pack is sent to the server. Hence, that means if there is more than one IoT device sending data to the same port, we will have no way to identify who is sending the data at the server side.

Hence, we need to make our TCP Listener to listen on multiple ports. The way I chose is to use List<Task> in the Run method as shown in the code below.

public override void Run()
{
    try
    {
        // Reading a list of ports assigned for trackers use
        ...

        var tasks = new List<Task>();
        
        foreach (var port in trackerPorts)
        {
            tasks.Add(this.RunAsync(this.cancellationTokenSource.Token, port));
        }
 
        Task.WaitAll(tasks.ToArray());
    }
    finally
    {
       this.runCompleteEvent.Set();
    }
}

Challenge 4: Worker Role Not Responding Irregularly

This turns out to be the biggest challenge in using Worker Role. After receiving data from the IoT devices for one or two days, the server was not recording any further new data even though the devices are working fine. So far, I’m still not sure about the cause even though there are people encountering similar issues as well.

Hence, I have to find a way to automatically restart the Worker Role for me. Thus, I decided to use PowerShell script to reboot the instance. There is a sample code on Microsoft Technet Gallery – Script Center which does similar thing.

I proceed to use Azure Automation which provides Runbooks to help handling the creation, deployment, monitoring, and maintenance of Azure resources. The Powershell Workflow Runbook that I use for rebooting the worker role daily is as follows.

workflow Reboot-CloudService
{
    Write-Output "Started!"
    
    $azureSubscriptionId = Get-AutomationVariable -Name "AzureSubscriptionId"
    $cloudServiceName = Get-AutomationVariable -Name "CloudServiceName"
    $workerRoleInstanceName = Get-AutomationVariable -Name "WorkerRoleInstanceName" 
    
    $myCredential = Get-AutomationPSCredential -Name "Chun Lin"
    Add-AzureAccount -Credential $myCredential
    
    Select-AzureSubscription -SubscriptionId $AzureSubscriptionId

    Write-Output "Restarting for cloud service: $cloudServiceName."

    ReSet-AzureRoleInstance -ServiceName $cloudServiceName -Slot "Production" -InstanceName $workerRoleInstanceName -Reboot

    Write-Output "Restarted successfully!"
}

In case you wonder where I defined the values for variables such as AzureSubscriptionId, CloudServiceName, and WorkerRoleInstanceName, as well as automation PowerShell credential, there are all easily found in the Azure Portal under “Share Resources” section of Azure Automation Account.

variables-and-credentials-in-automation.png
Providing credentials and variables for the Runbook.

After setting up the Runbook, we need to define schedules in Automation Account and then link it to the Runbook.

setting-schedules-for-automation.png
Setting up schedule and linking it to the Runbook.

There is another tool in the Azure Portal that I find it to be very useful to debug my PowerShell script in the Runbook. It is called the “Test Pane”. By using it, we can easily find out if the PowerShell script is correctly written to generate desired outcome.

test-pane.png
Test Pane available in Runbook.

After that, we can easily get a summary of how the job runs on Azure Portal, as shown in the following screenshot.

azure-automation.png
Job Statistics of Azure Automation.

Yup, that’s all what I had learnt in the December while everyone was enjoying the winter festivals. Please comment if you find a better alternative to handle the challenges above. Thanks in advance and happy new year to you!

References

[KOSD Series] IP Addresses of Our Azure App Services that need to be Whitelisted by Our API Providers

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.

kosd-azure_web_app-powershell

It is a common scenario for developers to integrate with different parties by using their APIs. Most of the time, the APIs are located in a locked-down network environment where only whitelisted IP addresses are allowed to access their APIs. We will then be asked to give the API providers the IP addresses of our servers.

If it’s our web back-end calling the APIs and we host our web applications on Microsoft Azure App Services, then how could we get the IP addresses?

As mentioned in a discussion about inbound IP address by Benjamin Perkins, the Escalation Engineer on the Azure team, there are about 4 outgoing IP addresses for an Azure Web Apps normally. To retrieve the outbound IP addresses of an Azure web app, we simply need to get it from the Properties of the web app on Azure Portal.

outbound-ip-addresses-in-azure-app-service.png
Locate the outbound IP addresses here.

We can also get the same result if we use the Azure Resource Explorer which is still in preview now.  Benjamin covered this in a video clip on his article too.

For PowerShell lovers, as pointed out by Adrian Calinescu, one of the commenters on Benjamin’s article, we can use PowerShell to find out the outbound IP addresses too. With the new Azure Cloud Shell, we can simply use the following command to retrieve directly the outbound IP addresses of an Azure web app on Azure Portal directly.

Get-AzureRmResource -ResourceGroupName  -ResourceType Microsoft.Web/sites -ResourceName  | select -expand Properties | Select-Object outboundIpAddresses

outbound-ip-addresses-in-azure-app-service-using-powershell.png
Managing Azure resources using shell directly on a browser.

For those who would like to have your own set of outbound IP addresses, please check out ASE (App Service Environment) which grants users control over inbound and outbound application network traffic.

Finally, we can also whitelist all the IP addresses of the Azure datacentres, which can be downloaded here.

azure-datacenter-ip-range-download.png
List of Microsoft Azure Datacentre IP addresses are available on Microsoft website.

References