S3 Bucket – cuteprogramming

Pushing Pkl Content from GitHub to AWS S3

June 2, 2024June 3, 2024 by Chun Lin, posted in Amazon Web Services, Cloud Computing: Amazon Web Services (AWS), Experience, Git, Pkl (Apple)

In the previous article, we talked about using the S3 Object Lambda to transform the medical records, which are stored in a JSON file, into a presentable web page. However, maintaining medical records in JSON files could be challenging. In this article, we will further investigate how we can generate those JSON files.

We’re going to explore Pkl—pronounced “Pickle”—a configuration-as-code language renowned for its robust validation features and tooling. It was first introduced by Apple as an open-source project in February 2024. Pkl allows us to write configurations as code, validate them, and convert them to existing static formats.

The part highlighted in red will be the focus of this article.

About Pkl

Pkl streamlines the creation of JSON scripts, enhancing maintainability and reducing verbosity through reuse, templating, and abstraction, all supported seamlessly right out of the box.

As we can expect from our medical records in JSON, the JSON files will grow larger over time. Hence it will be increasingly difficult to maintain. Pkl can help reduce the size and complexity of our JSON files by introducing abstractions for common elements and describing similar elements in terms of their differences.

A .pkl file describes a module. Modules are objects that can be referred to from other modules.

Pkl comes with basic types, such as Numbers, Strings, Durations, etc. Having a notation for basic types, we can thus write typed objects. For example, the following module shows how we will define our medical records structure in Pkl.

module medicalVisitTemplate

class MedicalVisit {

    medicalCentreName: String

    centreType: "clinic"|"specialist"|"hospital"

    visitStartDate: Date

    visitEndDate: Date

    remark: String

    treatments: Listing<Treatment>

}

class Treatment {

    name: String

    type: "medicine"|"operation"|"scanning"

    amount: String

    startDate: Date

    endDate: Date

}

class Date {

    year: Int(isBetween(2000, 2100))

    month: Int(isBetween(1, 12))

    day: Int(isBetween(1, 31))
}

visits: Listing<MedicalVisit>

Listing is a collection in Pkl. It contains exclusively Elements, i.e., object members. In the code above, we define visits to be a collection of MedicalVisits. The MedicalVisit class contains information about the visit, for example type and name of the medical centres the patient visited, visiting period, remark, etc. The visiting period is then defined by Date class which stores year, month, and day.

In the Date class, since the month can only be an integer from 1 to 12, so we can restrict it to an integer range by using Int and isBetween constraint. Later, as Pkl evaluates our configuration, if there is an invalid value, for example 13, provided to the month, there will be an error shown to us, as demonstrated below.

Pkl CLI will evaluate our configuration and show detected invalid values.

Generate JSON with Pkl

So now how do we generate JSON with the module above?

Before we can generate a JSON file, we need to understand the Amending concept in Pkl. As a first intuition, think of “amending a module” as “filling out a form.”

So, to generate the chunlin.json file that was shown in the previous blog post, we can amend the medicalVisitTemplate module above with another Pkl file called chunlin.pkl as shown below.

amends "medicalVisitTemplate.pkl"

visits = new Listing<MedicalVisit> {

    ...
    // Omitted for brevity

    new {

        medicalCentreName = "Tan Tock Seng Hospital"

        centreType = "hospital"

        visitStartDate {

            year = 2024

            month = 3

            day = 24

        }

        visitEndDate {

            year = 2024

            month = 4

            day = 19

        }

        remark = ""

        treatments = new Listing<Treatment> {

            ... 
            // Omitted for brevity

            new {

                name = "Betamethasone (Valerate) 0.025% Cream 15g - Dermasone"

                type = "medicine"

                amount = "Applied after shower"

                startDate {

                    year = 2024

                    month = 3

                    day = 26

                }

                endDate {

                    year = 2024

                    month = 4

                    day = 19

                }

            }

            new {

                name = "Betamethasone (Valerate) 0.1% Cream 15g - Uniflex(TM)"

                type = "medicine"

                amount = "Applied after shower"

                startDate {

                    year = 2024

                    month = 3

                    day = 26

                }

                endDate {

                    year = 2024

                    month = 4

                    day = 19

                }

            }

        }

    }

}

Now if we execute the command below on Pkl CLI to evaluate the given modules and render the

$ ./pkl eval -f json -o ./output/chunlin.json ./input/chunlin.pkl

With the command above, we can get the same output as we see in chunlin.json.

Maintain Pkl in GitHub

Static files like Pkl or JSON can be easily maintained in code repositories such as GitHub. Using GitHub for version control allows us to track changes to our PKL files over time. This makes it easy to revert to previous versions if something goes wrong, compare changes, and understand the evolution of our configuration files. Additionally, we can use GitHub Actions to automate various tasks related to our PKL files, enhancing efficiency and reliability in our workflow.

GitHub Actions is an automation tool that allows us to create workflows triggered by events within our repository. These workflows can automate tasks like testing, building, and deploying code, or even running scripts. By using GitHub Actions, we can streamline the development and transformation process of our Pkl files, ensure consistency, and improve efficiency.

Thus, our mission is now to configure GitHub Actions so that a JSON file can be produced from the Pkl file and sent to the Amazon S3 bucket that we setup in another article earlier.

Configure GitHub Actions Workflow

Firstly, we need to give permission to GitHub Actions to access our S3 bucket. To do so, we will create a new user in AWS Console with appropriate rights.

We only need two permissions, `s3:ListBucket` and `s3:PutObject`, to copy files from local to the S3 bucket.

After attaching the policy, we proceed to generate an access key for this newly created user.

Secondly, we need to navigate to our repository and then click on the Actions tab to create a new simple workflow, as shown below.

To begin, let’s download a new Linter available for Pkl files in the workflow. The linter is known as pkl-linter done by Eduardo Aguilar Yépez, a senior software engineer at Draftea.

name: Evaluate Pkl and store it in S3 as JSON

on:
  push:
    branches: [ "main" ]

  # Allows us to run this workflow manually from the Actions tab
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4
      
      - uses: actions/setup-go@v5
        with:
          go-version: '>=1.17.0'

      - name: Get Go Version
        run: go version

      - name: Install Linter
        run: go install github.com/Drafteame/pkl-linter@latest

      - name: Run pkl-linter
        run: pkl-linter medical-records

The linter analyses our code and shows detected stylistic errors.

Next, we need to install the Pkl CLI to evaluate Pkl modules and write their output to a file. There are native executables available for us to use. As shown in the workflow above, the GitHub Actions runner is ubuntu-latest, which uses the Ubuntu 22.04 LTS image as of Jun 2024. It uses the amd64 architecture. Hence, we can download the Pkl Linux executable for amd64 architecture.

name: Evaluate Pkl and store it in S3 as JSON

...
# Omitted for brevity

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      ...
      # Omitted for brevity

      - name: Install Pkl CLI
        run: curl -L -o pkl https://github.com/apple/pkl/releases/download/0.25.3/pkl-linux-amd64
        
      - name: Grant execute permission to Pkl CLI
        run: chmod +x pkl

      - name: Get Pkl CLI version
        run: ./pkl --version

      - name: Eval the Pkl files
        run: |
          cd medical-records
          files=$(find . -name "*.pkl")
          count=0
          for file in $files; do
            output_filename="${file%.pkl}.json"
            ../pkl eval -f json -o ../output/$output_filename $file
          done
          cd ..

When my workflow is executed in June 2024, the version of the Pkl CLI is “Pkl 0.25.3 (Linux 5.15.0-1053-aws, native)”.

As shown in the last step above, it will loop through the JSON file in the medical-records folder and evaluate them one-by-one using the Pkl CLI. The JSON files generated will be stored in the output folder.

Eventually, what we need to do is to upload the file over to our AWS S3 bucket. However, before that, let’s make sure the AWS access key and secret access key we generated earlier are stored securely on GitHub Actions, a shown in the screenshot below.

The AWS access key and secret access key should be stored as GitHub Actions secrets.

Now, we can easily setup AWS CLI with the secrets above and use the s3 cp command to move the generated JSON files over to our S3 bucket. To do so, we only need to complete our workflow with the following.

name: Evaluate Pkl and store it in S3 as JSON

...
# Omitted for brevity

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      ...
      # Omitted for brevity

      - name: Setup AWS CLI
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ap-southeast-1
          
      - name: Copy files to S3 bucket
        run: |
          aws s3 cp output s3://lunar.medicalrecords --exclude "*" --include "*.json" --recursive

Please take note that the s3 cp command performs operation only on single file, hence we need to apply the --recursive flag to indicate that the command should run on all files under the specified directory, i.e. output.

Wrap-Up

In conclusion, utilising Pkl for generating and maintaining JSON files offers significant advantages in terms of reducing complexity and enhancing maintainability. By abstracting common elements and leveraging typed objects, Pkl simplifies the management of large and evolving datasets. The structured approach provided by Pkl not only minimises redundancy but also ensures that configurations remain consistent and error-free through its robust validation features.

Additionally, by using GitHub Actions, we can automate the process of evaluating Pkl files, generating the corresponding JSON files as output, and uploading these JSON files to our S3 bucket. This automation not only enhances efficiency but also ensures that changes are tracked and managed effectively.

In summary, we can conclude the infrastructure that we have gone through above and our previous article in the following diagram.

References

Processing S3 Data before Returning It with Object Lambda (version 2024)

May 19, 2024May 19, 2024 by Chun Lin, posted in Amazon Web Services, Cloud Computing: Amazon Web Services (AWS), Experience, HTML5, Idea!, OAuth, Python, Security

We use Amazon S3 to store data for easy sharing among various applications. However, each application has its unique requirements and might require a different perspective on the data. To solve this problem, at times, we store additional customised datasets of the same data, ensuring that each application has its own unique dataset. This sometimes creates another set of problems because we now need to maintain additional datasets.

In March 2021, a new feature known as S3 Object Lambda was introduced. Similar to the idea of setting up a proxy layer in front of S3 to intercept and process data as it is requested, Object Lambda uses AWS Lambda functions to automatically process and transform your data as it is being retrieved from S3. With Object Lambda, we only need to change our apps to use the new S3 Object Lambda Access Point instead of the actual bucket name to retrieve data from S3.

Simplified architecture diagram showing how S3 Object Lambda works.

Example: Turning JSON to Web Page with S3 Object Lambda

I have been keeping details of my visits to medical centres as well as the treatments and medicines I received in a JSON file. So, I would like to take this opportunity to show how S3 Object Lambda can help in doing data processing.

The JSON file looks something as follows.

{
  "visits": [
    {
      "medicalCentreName": "Tan Tock Seng Hospital",
      "centreType": "hospital",
      "visitStartDate": {
        "year": 2024,
        "month": 3,
        "day": 24
      },
      "visitEndDate": {
        "year": 2024,
        "month": 4,
        "day": 19
      },
      "purpose": "",
      "treatments": [
        {
          "name": "Antibiotic Meixam(R) 500 Cloxacillin Sodium",
          "type": "medicine",
          "amount": "100ml per doese every 4 hours",
          "startDate": {
            "year": 2024,
            "month": 3,
            "day": 26
          },
          "endDate": {
            "year": 2024,
            "month": 4,
            "day": 19
          }
        },
        ...
      ]
    },
    ...
  ]
}

In this article, I will show the steps I took to setup the S3 Object Lambda architecture for this use case.

Step 1: Building the Lambda Function

Before we begin, we need to take note that the maximum duration for a Lambda function used by S3 Object Lambda is 60 seconds.

We need a Lambda Function to do the data format transformation from JSON to HTML. To keep things simple, we will be developing the Function using Python 3.12.

Object Lambda does not need any API Gateway since it should be accessed via the S3 Object Lambda Access Point.

In the beginning, we can have the code as follows. The code basically does two things. Firstly, it performs some logging. Secondly, it reads the JSON file from S3 Bucket.

import json
import os
import logging
import boto3
from urllib import request
from urllib.error import HTTPError
from types import SimpleNamespace

logger = logging.getLogger()
logger.addHandler(logging.StreamHandler())
logger.setLevel(getattr(logging, os.getenv('LOG_LEVEL', 'INFO')))

s3_client = boto3.client('s3')
    
def lambda_handler(event, context):
    object_context = event["getObjectContext"]
    # Get the presigned URL to fetch the requested original object from S3
    s3_url = object_context["inputS3Url"]
    # Extract the route and request token from the input context
    request_route = object_context["outputRoute"]
    request_token = object_context["outputToken"]
    
    # Get the original S3 object using the presigned URL
    req = request.Request(s3_url)
    try:
        response = request.urlopen(req)
        responded_json = response.read().decode()
    except Exception as err:
        logger.error(f'Exception reading S3 content: {err}')
        return {'status_code': 500}
        
    json_object = json.loads(responded_json, object_hook=lambda d: SimpleNamespace(**d))
    
    visits = json_object.visits

    html = ''
    
    s3_client.write_get_object_response(
        Body = html,
        ContentType = 'text/html',
        RequestRoute = request_route,
        RequestToken = request_token)
    
    return {'status_code': 200}

Step 1.1: Getting the JSON File with Presigned URL

In the event that an Object Lambda receives, there is a property known as the getObjectContext, which contains useful information for us to figure out the inputS3Url, which is the presigned URL of the object in S3.

By default, all S3 objects are private and thus for a Lambda Function to access the S3 objects, we need to configure the Function to have S3 read permissions to retrieve the objects. However, with the presigned URL, the Function can get the object without the S3 read permissions.

In the code above, we can retrieve the JSON file from the S3 using its presigned URL. After that we parse the JSON file content with json.loads() method and convert it into a JSON object with SimpleNamespace. Thus the variable visits now should have all the visits data from the original JSON file.

Step 1.2: Call WriteGetObjectResponse

Since the purpose of Object Lambda is to process and transform our data as it is being retrieved from S3, we need to pass transformed object to a GetObject operation in the Function via the method write_get_object_response. Without this method, there will be an error from the Lambda complaining that it is missing.

Error: The Lambda exited without successfully calling WriteGetObjectResponse.

The method write_get_object_response requires two compulsory parameters, i.e. RequestRoute and RequestToken. Both of them are available from the property getObjectContext under the name outputRoute and outputToken.

Step 1.3: Get the HTML Template from S3

To make our Lambda code cleaner, we will not write the entire HTML there. Instead, we keep a template of the web page in another S3 bucket.

Now, the architecture above will be improved to include second S3 bucket which will provide web page template and other necessary static assets.

Introducing second S3 bucket for storing HTML template and other assets.

Now, we will replace the line html = '' earlier with the Python code below.

    template_response = s3_client.get_object(
        Bucket = 'lunar.medicalrecords.static',
        Key = 'medical-records.html'
    )
            
    template_object_data = template_response['Body'].read()
    template_content = template_object_data.decode('utf-8')

    dynamic_table = f"""
        <table class="table accordion">
            <thead>
                <tr>
                    <th scope="col">#</th>
                    <th scope="col">Medical Centre</th>
                    <th scope="col">From</th>
                    <th scope="col">To</th>
                    <th scope="col">Purpose</th>
                </tr>
            </thead>
                                
            <tbody>
                ...
            </tbody>
        </table>"""
    
    html = template_content.replace('{{DYNAMIC_TABLE}}', dynamic_table)

Step 2: Give Lambda Function Necessary Permissions

With the setup we have gone through above, we understand that our Lambda Function needs to have the following permissions.

s3-object-lambda:WriteGetObjectResponse
s3:GetObject

Step 3: Create S3 Access Point

Next, we will need to create a S3 Access Point. It will be used to support the creation of the S3 Object Lambda Access Point later.

One of the features that S3 Access Point offers is that we can specify any name that is unique within the account and region. For example, as shown in the screenshot below, we can actually have a “lunar-medicalrecords” access point in every account and region.

Creating an access point from the navigation pane of S3.

When we are creating the access point, we need to specify the bucket which resides in the same region that we want to use with this Access Point. In addition, since we are not restricting the access of it to only a specific VPC in our case, we will be choosing “Internet” for the “Network origin” field.

After that, we keep all other defaults as is. We can directly proceed to choose the “Create access point” button.

Our S3 Access Point is successfully created.

Step 4: Create S3 Object Lambda Access Point

After getting our S3 Access Point set up, we can then move on to create our S3 Object Lambda Access Point. This is the actual access point that our app will be using to access the JSON file in our S3 bucket. It then should return a HTML document generated by the Object Lambda that we built in Step 1.

Creating an object lambda access point from the navigation pane of S3.

In the Object Lambda Access Point creation page, after we give it a name, we need to provide the Supporting Access Point. This access point is the Amazon Resource Name (ARN) of the S3 Access Point that we created in Step 3. Please take note that both the Object Lambda Access Point and Supporting Access Point must be in the same region.

Next we need to setup the transformation configuration. In our case, we will be retrieving the JSON file from the S3 bucket to perform the data transformation via our Lambda Function, so we will be choosing GetObject as the S3 API we will be using, as shown in the screenshot below.

Configuring the S3 API that will be used in the data transformation and the Lambda Function to invoke.

Once all these fields are keyed in, we can proceed to create the Object Lambda Access Point.

Now, we will access the JSON file via the Object Lambda Access Point to verify that the file is really transformed into a web page during the request. To do so, firstly, we need to select the newly create Object Lambda Access Point as shown in the following screenshot.

Locate the Object Lambda Access Point we just created in the S3 console.

Secondly, we will be searching for our JSON file, for example chunlin.json in my case. Then, we will click on the “Open” button to view it. The reason why I name the JSON file containing my medical records is because later I will be adding authentication and authorisation to only allow users retrieving their own JSON file based on their login user name.

This page looks very similar to the usual S3 objects listing page. So please make sure you are doing this under the “Object Lambda Access Point”.

There will be new tab opened showing the web page as demonstrated in the screenshot below. As you have noticed in the URL, it is still pointing to the JSON file but the returned content is a HTML web page.

The domain name is actually no longer the usual S3 domain name but it is our Object Lambda Access Point.

Using the Object Lambda Access Point from Our App

With the Object Lambda Access Point successfully setup, we will show how we can use it. To not overcomplicate things, for the purposes of this article, I will host a serverless web app on Lambda which will be serving the medical record website above.

In addition, since Lambda Functions are by default not accessible from the Internet, we will be using API Gateway so that we can have a custom REST endpoint in the AWS and thus we can map this endpoint to the invokation of our Lambda Function. Technically speaking, the architecture diagram now looks as follows.

This architecture allows public to view the medical record website which is hosted as a serverless web app.

In the newly created Lambda, we will still be developing it with Python 3.12. We name this Lambda lunar-medicalrecords-frontend. We will be using the following code which will retrieve the HTML content from the Object Lambda Access Point.

import json
import os
import logging
import boto3

logger = logging.getLogger()
logger.addHandler(logging.StreamHandler())
logger.setLevel(getattr(logging, os.getenv('LOG_LEVEL', 'INFO')))

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    try:
        bucket_name = 'ol-lunar-medicalreco-t5uumihstu69ie864td6agtnaps1a--ol-s3'
        object_key = 'chunlin.json'
            
        response = s3_client.get_object(
            Bucket=bucket_name,
            Key=object_key
        )
    
        object_data = response['Body'].read()
        object_string = object_data.decode('utf-8')
            
        return {
            'statusCode': 200,
            'body': object_string,
            'headers': {
                'Content-Type': 'text/html'
            }
        }
    
    except Exception as err:
        return {
            'statusCode': 500,
            'body': json.dumps(str(err))
        }

As shown in the code above, we are still using the same function get_object from the S3 client to retrieve the JSON file, chunlin.json. However, instead of providing the bucket name, we will be using the Object Lambda Access Point Alias, which is located at the S3 Object Lambda Access Points listing page.

This is where we can find the Object Lambda Access Point Alias.

You can read more about the Boto3 get_object documentation to understand more about its Bucket parameter.

The Boto3 documentation highlights the use of Object Lambda Access Point in get_object.

The API Gateway for the Lambda Function is created with HTTP API through the “Add Trigger” function (which is located at the Function overview page). For the Security field, we will be choosing “Open” for now. We will add the login functionality later.

Adding API Gateway as a trigger to our Lambda.

Once this is done, we will be provided an API Gateway endpoint, as shown in the screenshot below. Visiting the endpoint should be rendering the same web page listing the medical records as we have seen above.

Getting the API endpoint of the API Gateway.

Finally, for the Lambda Function permission, we only need to grand it the following.

s3:GetObject.

To make the API Gateway endpoint looks more user friendly, we can also introduce Custom Domain to the API Gateway, following the guide in one of our earlier posts.

Assigned medical.chunlinprojects.com to our API Gateway.

Protecting Data with Cognito

In order to ensure that only authenticated and authorised users can access their own medical records, we need to securely control access to our the app with the help from Amazon Cognito. Cognito is a service that enables us to add user sign-in and access control to our apps quickly and easily. Hence it helps authenticate and authorise users before they can access the medical records.

Step 1: Setup Amazon Cognito

To setup Cognito, firstly, we need to configure the User Pool by specifying sign-in options. User pool is a managed user directory service that provides authentication and user management capabilities for our apps. It enables us to offload the complexity of user authentication and management to AWS.

Configuring sign-in options and user name requirements.

Please take note that Cognito user pool sign-in options cannot be changed after the user pool has been created. Hence, kindly think carefully during the configuration.

Secondly, we need to configure password policy and choose whether to enable Multi-Factor Authentication (MFA).

By default, Cognito comes with a password policy that ensures our users maintain a password with a minimum length and complexity. For password reset, it will also generate a temporary password to the user which will expire in 7 days, by default.

MFA adds an extra layer of security to the authentication process by requiring users to provide additional verification factors to gain access to their accounts. This reduces the risk of unauthorised access due to compromised passwords.

As shown in the screenshot above, one of the methods is called TOTP. TOTP stands for Time-Based One-Time Password. It is a form of multi-factor authentication (MFA) where a temporary passcode is generated by the authenticator app, adding a layer of security beyond the typical username and password.

Thirdly, we will be configuring Cognito to allow user account recovery as well as new user registration. Both of these by default require email delivery. For example, when users request an account recovery code, an email with the code should be sent to the user. Also, when there is a new user signing up, there should be emails sent to verify and confirm the new account of the user. So, how do we handle the email delivery?

We can choose to send email with Cognito in our development environment.

Ideally, we should be setting up another service known as Amazon SES (Simple Email Service), an email sending service provided by AWS, to deliver the emails. However, for testing purpose, we can choose to use Cognito default email address as well. This approach is normally only suitable for development purpose because we can only use it to send up to 50 emails a day.

Finally, we will be using the hosted authentication pages for user sign-in and sign-up, as demonstrated below.

Using hosted UI so that we can have a simple frontend ready for sign-in and sign-up.

Step 2: Register Our Web App in Cognito

To integrate our app with Cognito, we still need to setup the app client. An App Client is a configuration entity that allows our app to interact with the user pool. It is essentially an application-specific configuration that defines how users will authenticate and interact with our user pool. For example, we have setup a new app client for our medical records app as shown in the following screenshot.

We customise the hosetd UI with our logo and CSS.

As shown in the screenshot above, we are able to to specify customisation settings for the built-in hosted UI experience. Please take note that we are only able to customise the look-and-feel of the default “login box”, so we cannot modify the layout of the entire hosted UI web page, as demonstrated below.

The part with gray background cannot be customised with the CSS.

In the setup of the app client above, we have configured the callback URL to /authy-callback. So where does this lead to? It actually points to a new Lambda function which is in charge of the authentication.

Step 3: Retrieve Access Token from Cognito Token Endpoint

Here, Cognito uses the OAuth 2.0 authorization code grant flow. Hence, after successful authentication, Cognito redirects the user back to the specified callback URL with an authorisation code included in the query string with the name code. Our authentication Lambda function thus needs to makes a back-end request to the Cognito token endpoint, including the authorisation code, client ID, and redirect URI to exchange the authorisation code for an access token, refresh token, and ID token.

Client ID can be found under the “App client information” section.

auth_code = event['queryStringParameters']['code']
    
token_url = "https://lunar-corewebsite.auth.ap-southeast-1.amazoncognito.com/oauth2/token"
client_id = "<client ID to be found in AWS Console>"
callback_url = "https://medical.chunlinprojects.com/authy-callback"
        
params = {
    "grant_type": "authorization_code",
    "client_id": client_id,
    "code": auth_code,
    "redirect_uri": callback_url
}
        
http = urllib3.PoolManager()
tokens_response = http.request_encode_body(
    "POST", 
    token_url, 
    encode_multipart = False, 
    fields = params,
    headers = {'Content-Type': 'application/x-www-form-urlencoded'})
            
token_data = tokens_response.data
tokens = json.loads(token_data)

As shown in the code above, the token endpoint URL for a Cognito user pool generally follows the following structure.

https://<your-domain>.auth.<region>.amazoncognito.com/oauth2/token

A successful response from the token endpoint typically is a JSON object which includes:

access_token: Used to access protected resources;
id_token: Contains identity information about the user;
refresh_token: Used to obtain new access tokens;
expires_in: Lifetime of the access token in seconds.

Hence we can retrieve the medical records if there is an access_token but return an “HTTP 401 Unauthorized” response if there is no access_token returned.

if 'access_token' not in tokens:
    return {
        'statusCode': 401,
        'body': get_401_web_content(),
        'headers': {
            'Content-Type': 'text/html'
        }
    }
            
else:
    access_token = tokens['access_token']
            
    return {
        'statusCode': 200,
        'body': get_web_content(access_token),
        'headers': {
            'Content-Type': 'text/html'
        }
    }

The function get_401_web_content is responsible to retrieve a static web page showing 401 error message from the S3 bucket and return it to the frontend, as shown in the code below.

def get_401_web_content():
    bucket_name = 'lunar.medicalrecords.static'
    object_key = '401.html'
            
    response = s3_client.get_object(
        Bucket=bucket_name,
        Key=object_key
    )
            
    object_data = response['Body'].read()
    content = object_data.decode('utf-8')
    
    return content

Step 4: Retrieve Content Based on Username

For the get_web_content function, we will be passing the access token to the Lambda that we developed earlier to retrieve the HTML content from the Object Lambda Access Point. As shown in the following code, we invoke the Lambda function synchronously and wait for the response.

def get_web_content(access_token):
    useful_tokens = {
        'access_token': access_token
    }
    
    lambda_response = lambda_client.invoke(
        FunctionName='lunar-medicalrecords-frontend',
        InvocationType='RequestResponse',
        Payload=json.dumps(useful_tokens)
    )
        
    lambda_response_payload = lambda_response['Payload'].read().decode('utf-8')
    
    web_content = (json.loads(lambda_response_payload))['body']
    
    return web_content

In the Lambda function lunar-medicalrecords-frontend, we will no longer need to hardcode the object key as chunlin.json. Instead, we can just retrieve the user name from the Cognito using the access token, as highlighted in bold in the code below.

...
import boto3

cognito_idp_client = boto3.client('cognito-idp')

def lambda_handler(event, context):
    if 'access_token' not in event:
        return {
            'statusCode': 200,
            'body': get_homepage_web_content(),
            'headers': {
                'Content-Type': 'text/html'
            }
        }

    else:
        cognitio_response = cognito_idp_client.get_user(AccessToken = event['access_token'])
        
        username = cognitio_response['Username']
    
        try:
            bucket_name = 'ol-lunar-medicalreco-t5uumihstu69ie864td6agtnaps1a--ol-s3'
            object_key = f'{username}.json'
            
            ...
    
        except Exception as err:
            return {
                'statusCode': 500,
                'body': json.dumps(str(err))
            }

The get_homepage_web_content function above basically is to retrieve a static homepage from the S3 bucket. It is similar to how the get_401_web_content function above works.

The homepage comes with a Login button redirecting users to Hosted UI of our Cognito app client.

Step 5: Store Access Token in Cookies

We need to take note that the auth_code above in the OAuth 2.0 authorisation code grant flow can only be used once. This is because single-use auth_code prevents replay attacks where an attacker could intercept the authorisation code and try to use it multiple times to obtain tokens. Hence, our implementation above will break if we refresh our web page after logging in.

To solve this issue, we will be saving the access token in a cookie when the user first signs in. After that, as long as we detect that there is a valid access token in the cookie, we will not use the auth_code.

In order to save an access token in a cookie, there are several important considerations to ensure security and proper functionality:

Set the Secure attribute to ensure the cookie is only sent over HTTPS connections. This helps protect the token from being intercepted during transmission;
Use the HttpOnly attribute to prevent client-side scripts from accessing the cookie. This helps mitigate the risk of cross-site scripting (XSS) attacks;
Set an appropriate expiration time for the cookie. Since access tokens typically have a short lifespan, ensure the cookie does not outlive the token’s validity.

Thus the code at Step 3 above can be improved as follows.

def lambda_handler(event, context):
    now = datetime.now(timezone.utc)
    
    if 'cookies' in event:
        for cookie in event['cookies']:
            if cookie.startswith('access_token='):
                access_token = cookie.replace("access_token=", "")
                break
            
        if 'access_token' in locals():
            
            returned_html = get_web_content(access_token)
        
            return {
                'statusCode': 200,
                'headers': {
                    'Content-Type': 'text/html'
                },
                'body': returned_html
            }
        
        return {
            'statusCode': 401,
            'body': get_401_web_content(),
            'headers': {
                'Content-Type': 'text/html'
            }
        }
    
    else:
        ...
        if 'access_token' not in tokens:
            ...
            
        else:
            access_token = tokens['access_token']
        
            cookies_expiry = now + timedelta(seconds=tokens['expires_in'])

            return {
                'statusCode': 200,
                'headers': {
                    'Content-Type': 'text/html',
                    'Set-Cookie': f'access_token={access_token}; path=/; secure; httponly; expires={cookies_expiry.strftime("%a, %d %b %Y %H:%M:%S")} GMT'
                },
                'body': get_web_content(access_token)
            }

With this, now we can safely refresh our web page and there should be no case of reusing the same auth_code repeatedly.

Wrap-Up

In summary, we can conclude the infrastructure that we have gone through above in the following diagram.

References

Serverless Web App on AWS Lambda with .NET 6

September 27, 2023September 27, 2023 by Chun Lin, posted in Amazon Web Services, C#, Experience, Visual Studio

We have a static website for marketing purpose hosting on Amazon S3 buckets. S3 offers a pay-as-you-go model, which means we only pay for the storage and bandwidth used. This can be significantly cheaper than traditional web hosting providers, especially for websites with low traffic.

However, S3 is designed as a storage service, not a web server. Hence, it lacks many features found in common web hosting providers. We thus decide to use AWS Lambda to power our website.

AWS Lambda and .NET 6

AWS Lambda is a serverless service that runs code for backend service without the need to provision or manage servers. Building serverless apps means that we can focus on our web app business logic instead of worrying about managing and operating servers. Similar to S3, Lambda helps to reduce overhead and lets us reclaim time and energy that we can spent on developing our products and services.

Lambda natively supports several programming languages such as Node.js, Go, and Python. In February 2022, the AWS team announced that .NET 6 runtime can be officially used to build Lambda functions. That means now Lambda also supports C#10 natively.

So as the beginning, we will setup the following simple architecture to retrieve website content from S3 via Lambda.

Simple architecture to host our website using Lambda and S3.

API Gateway

When we are creating a new Lambda service, we have the option to enable the function URL so that a HTTP(S) endpoint will be assigned to our Lambda function. With the URL, we can then use it to invoke our function through, for example, an Internet browser directly.

The Function URL feature is an excellent choice when we seek rapid exposure of our Lambda function to the wider public on the Internet. However, if we are in search of a more comprehensive solution, then opting for API Gateway in conjunction with Lambda may prove to be the better choice.

We can configure API Gateway as a trigger for our Lambda function.

Using API Gateway also enables us to invoke our Lambda function with a secure HTTP endpoint. In addition, it can do a bit more, such as managing large volumes of calls to our function by throttling traffic and automatically validating and authorising API calls.

Keeping Web Content in S3

Now, we will create a new S3 bucket called “corewebsitehtml” to store our web content files.

We then can upload our HTML file for our website homepage to the S3 bucket.

We will store our homepage HTML in the S3 for Lambda function to retrieve it later.

Retrieving Web Content from S3 with C# in Lambda

With our web content in S3, the next issue will be retrieving the content from S3 and returning it as response via the API Gateway.

According to performance evaluation, even though C# is the slowest on a cold start, it is one of the fastest languages if few invocations go one by one.

The code editor on AWS console does not support the .NET 6 runtime. Thus, we have to install the AWS Toolkit for Visual Studio, so that we can easily develop, debug, and deploy .NET applications using AWS, including the AWS Lambda.

Here, we will use the AWS SDK for reading the file from S3 as shown below.

public async Task<APIGatewayProxyResponse> FunctionHandler(APIGatewayProxyRequest request, ILambdaContext context)
{
    try 
    {
        RegionEndpoint bucketRegion = RegionEndpoint.APSoutheast1;

        AmazonS3Client client = new(bucketRegion);

        GetObjectRequest s3Request = new()
        {
            BucketName = "corewebsitehtml",
            Key = "index.html"
        };

        GetObjectResponse s3Response = await client.GetObjectAsync(s3Request);

        StreamReader reader = new(s3Response.ResponseStream);

        string content = reader.ReadToEnd();

        APIGatewayProxyResponse response = new()
        {
            StatusCode = (int)HttpStatusCode.OK,
            Body = content,
            Headers = new Dictionary<string, string> { { "Content-Type", "text/html" } }
        };

        return response;
    } 
    catch (Exception ex) 
    {
        context.Logger.LogWarning($"{ex.Message} - {ex.InnerException?.Message} - {ex.StackTrace}");

        throw;
    }
}

As shown in the code above, we first need to specify the region of our S3 Bucket, which is Asia Pacific (Singapore). After that, we also need to specify our bucket name “corewebsitehtml” and the key of the file which we are going to retrieve the web content from, i.e. “index.html”, as shown in the screenshot below.

Deploy from Visual Studio

After ew have done the coding of the function, we can right click on our project in the Visual Studio and then choose “Publish to AWS Lambda…” to deploy our C# code to Lambda function, as shown in the screenshot below.

Publishing our function code to AWS Lambda from Visual Studio.

After that, we will be prompted to key in the name of the Lambda function as well as the handler in the format of <assembly>::<type>::<method>.

Then we are good to proceed to deploy our Lambda function.

Logging with .NET in Lambda Function

Now when we hit the URL of the API Gateway, we will receive a HTTP 500 internal server error. To investigate, we need to check the error logs.

Lambda logs all requests handled by our function and automatically stores logs generated by our code through CloudWatch Logs. By default, info level messages or higher are written to CloudWatch Logs.

Thus, in our code above, we can use the Logger to write a warning message if the file is not found or there is an error retrieving the file.

context.Logger.LogWarning($"{ex.Message} - {ex.InnerException?.Message} - {ex.StackTrace}");

Hence, now if we access our API Gateway URL now, we should find a warning log message in our CloudWatch, as shown in the screenshot below. The page can be accessed from the “View CloudWatch logs” button under the “Monitor” tab of the Lambda function.

Viewing the log streams of our Lambda function on CloudWatch.

From one of the log streams, we can filter the results to list only those with the keyword “warn”. From the log message, we then know that our Lambda function has access denied from accessing our S3 bucket. So, next we will setup the access accordingly.

Connecting Lambda and S3

Since both our Lambda function and S3 bucket are in the same AWS account, we can easily grant the access from the function to the bucket.

Step 1: Create IAM Role

By default, Lambda creates an execution role with minimal permissions when we create a function in the Lambda console. So, now we first need to create an AWS Identity and Access Management (IAM) role for the Lambda function that also grants access to the S3 bucket.

In the IAM homepage, we head to the Access Management > Roles section to create a new role, as shown in the screenshot below.

Click on the “Create role” button to create a new role.

In the next screen, we will choose “AWS service” as the Trusted Entity Type and “Lambda” as the Use Case so that Lambda function can call AWS services like S3 on our behalf.

Next, we need to select the AWS managed policies AWSLambdaBasicExecutionRole and AWSXRayDaemonWriteAccess.

Finally, in the Step 3, we simply need to key in a name for our new role and proceed, as shown in the screenshot below.

We will call our new role “CoreWebsiteFunctionToS3”.

Step 2: Configure the New IAM Role

After we have created this new role, we can head back to the IAM homepage. From the list of IAM roles, we should be able to see the role we have just created, as shown in the screenshot below.

Search for the new role that we have just created.

Since the Lambda needs to assume the execution role, we need to add lambda.amazonaws.com as a trusted service. To do so, we simply edit the trust policy under the Trust Relationships tab.

Updating the Trust Policy of the new role.

The trust policy should be updated to be as follows.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

After that, we also need to add one new inline policy under the Permissions tab.

We need to grant this new role to the list and read access (s3:ListBucket and s3:GetObject) access our S3 bucket (arn:aws:s3:::corewebsitehtml) and its content (arn:aws:s3:::corewebsitehtml/*) with the following policy in JSON. The reason why we grant the list access is so that our .NET code later can tell whether the list is empty or not. If we only grant this new role the read access, the AWS S3 SDK will always return 404.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
	    "Effect": "Allow",
	    "Action": [
                "s3:GetObject",
	        "s3:ListBucket"
	    ],
	    "Resource": [
	        "arn:aws:s3:::corewebsitehtml/*",
	        "arn:aws:s3:::corewebsitehtml"
	    ]
        }
    ]
}

You can switch to the JSON editor, as shown in the following screenshot, to easily paste the JSON above into the AWS console.

Creating inline policy for our new role to access our S3 bucket.

After giving this inline policy a name, for example “CoreWebsiteS3Access”, we can then proceed to create it in the next step. We should now be able to see the policy being created under the Permission Policies section.

We will now have three permission policies for our new role.

Step 3: Set New Role as Lambda Execution Role

So far we have only setup the new IAM role. Now, we need to configure this new role as the Lambda functions execution role. To do so, we have to edit the current Execution Role of the function, as shown in the screenshot below.

Edit the current execution role of a Lambda function.

Next, we need to change the execution role to the new IAM role that we have just created, i.e. CoreWebsiteFunctionToS3.

After save the change above, when we visit the Execution Role section of this function again, we should see that it can already access Amazon S3, as shown in the following screenshot.

Yay, our Lambda function can access S3 bucket now.

Step 4: Allow Lambda Access in S3 Bucket

Finally, we also need to make sure that the S3 bucket policy doesn’t explicitly deny access to our Lambda function or its execution role with the following policy.

{
    "Version": "2012-10-17",
    "Id": "CoreWebsitePolicy",
    "Statement": [
        {
            "Sid": "CoreWebsite",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::875137530908:role/CoreWebsiteFunctionToS3"
            },
            "Action": "s3:GetObject",
            "Resource": [
                "arn:aws:s3:::corewebsitehtml/*",
                "arn:aws:s3:::corewebsitehtml"
            ]
        }
    ]
}

The JSON policy above can be entered in the Bucket Policy section, as demonstrated in the screenshot below.

Simply click on the Edit button to input our new bucket policy.

Setup Execution Role During Deployment

Since we have updated to use the new execution role for our Lambda function, in our subsequent deployment of the function, we should remember to set the role to be the correct role, i.e. CoreWebsiteFunctionToS3, as highlighted in the screenshot below.

Please remember to use the correct execution role during the deployment.

After we have done all these, we shall be able to see our web content which is stored in S3 bucket to be displayed when we visit the API Gateway URL on our browser.