Handwritten Text Recognition, OCR, and Key Vault

Recently, I am glad to have help from Marvin Heng, the Microsoft MVP in Artificial Intelligence category, to work with me on building an experiment tool, FutureNow, to recognize handwritten texts as well as apply OCR technology to automate forms processing.

In January 2019, we also successfully presented our solution during the Singapore .NET Developers Community meetup. Taking the opportunity, I also presented how Azure Key Vault is used in our project to centralize our key and secret management.

Marvin is sharing with the audience about Custom Vision during the meetup.

Hence, in this article, I’d like to share about this project in terms of how we use Cognitive Services and Key Vault.

Code Repository

The code of our project is available in both Azure DevOps and Github. I will update both places to make sure the codes are updated.

The reason I have my codes in both places because the project is originally collaborated in Azure DevOps. However, during meetup, I realized majority of the audience still prefer us to have our codes on Github. Well…

Azure DevOps: https://dev.azure.com/gohchunlin/JobCreationAutomation
Github: https://github.com/sg-dotnet/text-recognition-ocr

Our “FutureNow” tool where user can use it to analyze text on images.

Custom Vision

What Marvin has contributed fully is to implement a function to detect and identify the handwritten texts in the uploaded image.

To do so, he first created a project in Custom Vision to train the model. In the project, he uploaded many images of paper documents and then labelled the handwritten texts found on the paper.

The part where the system analyzes the uploaded image and finds the handwriting part is in the TagAndAnalyzeService.cs.

In the AnalyzeImageAsync method, we first use the Custom Vision API which is linked to Marvin’s project to identify which parts in the image are “probably” handwritten.

At this point of time, the system still cannot be hundred-percent sure the parts it identifies as handwritten text really contain handwritten text. Hence, the result returns from the API contains a probability value. That’s why we have a percentage bar on our front-end to control the threshold for this probability value to accept only those results having a higher probability value will be accepted.

Handwritten Text Extraction with Computer Vision

After the previous step is done, then we will crop those filtered sections out from the uploaded image and then send each of the smaller image to the text recognition API in Cognitive Service to process the image and to extract out the text.

Hence in the code, the HandwrittenRecognitionService will be called to perform the image processing with the Computer Vision API version 1.0 recognizeText method.

There is an interesting do…while loop in the method. The loop is basically used to wait for the API to return the image processing results. It turns out that most of the time, the API will not directly return the result. Instead, it will return a JSON object telling us that it’s still processing the image. Only when it returns the JSON object with status set to “Succeeded”, then we know that the analysis result is returned together in the JSON object.

do
{
    var textOperation = response.Headers.GetValues("Operation-Location").FirstOrDefault();

    var result = await client.GetAsync(textOperation);

    string jsonResponse = result.Content.ReadAsStringAsync().Result;

    var handwrittenAnalyzeResult = JsonConvert.DeserializeObject(jsonResponse);

    isAnalizying = handwrittenAnalyzeResult.Status != "Succeeded";
 
    if (!isAnalizying)
    {
        return handwrittenAnalyzeResult;
    }
} while (isAnalizying);

In order to display to the user in front-end the results, we will store the cropped images in Azure Blob Storage and then display both the images and their corresponding extracted texts on the web page.

Unfortunately, the reading of handwritten text from images is a technology which is still currently in preview and is only available for English text. Hence, we need to wait a while until we can deploy it for production use.

OCR with Computer Vision

Using Computer Vision to perform OCR can better detect and extract text in an image especially when the image is a screenshot of a computer generated PDF file.

In OpticalCharacterRecognitionService, we simply call the Computer Vision API OCR method with the uploaded image and language set to English by default, then we can easily get the result of the OCR back in JSON format.

Key Vault

Key Vault in this project is mainly for managing the keys and connection string to the Azure Blob Storage.

To retrieve any of the secrets, we simply make use of the Microsoft.Azure.KeyVault Nuget package, as shown below.

var azureServiceTokenProvider = new AzureServiceTokenProvider();

var keyVaultClient = new KeyVaultClient(new KeyVaultClient.AuthenticationCallback(azureServiceTokenProvider.KeyVaultTokenCallback));

var secret = await keyVaultClient.GetSecretAsync($"https://futurenow.vault.azure.net/secrets/{ secretName }").ConfigureAwait(false);

According to Microsoft Azure documentation, there are service limits in Key Vault to ensure quality of service provided. Hence, when a service threshold is exceeded, any further requests from the client will not get successful response from Key Vault. Instead, HTTP status code 429 (Too many requests) will be returned.

There is an official guidance to handle Key Vault throttling. Currently, the code sample provided in the sample is not correct because the retry and waitTime variables are not used.

Incorrect sample code provided in Microsoft Docs.

Regarding this problem, I have raised issues (#22859 and #22860) and submitted a pull request to Microsoft on Github. Currently the PR is not yet approved but both Bryan Lamos and Prashanth Yerramilli have agreed that the code is indeed incorrect. Anyway, in our KeyVaultService class, the code has already been corrected.

EDIT (26 January 2019): The pull request has been approved. =)

Conclusion

Even though this project is just an experimental project for us to understand more about the power of Custom Vision and Computer Vision, I am glad that through this project, I manage to learn additional knowledge about Blob Storage, Azure DevOps, Key Vault, etc. and then later share it with the Singapore .NET Developers Community members.

Special thanks to Marvin for helping me in this project.