S3 Static Hosting Fixes

A circular flowing figure eight representing CICD with labels and two engineers working on it.

Introduction

We're going to examine what we need to implement in order to productize the workaround discovered in S3 Static Hosting Issues. The goal is to have hands-free Continuous Integration and Continuous Deployment (CICD) system for deploying a Nuxt.js static web application to an AWS S3 Bucket and Web Server.

The workaround identified in S3 Static Hosting Issues was implemented in the AWS console. To perform the same operation automatically we'll need a method, tool or library that can:

Identify all the 'index' or flat .html page files in the S3 bucket automatically
Rearrange For each .html page file to an optimal S3 object structure

Identification

The first step is to automatically identify all the 'index' or flat .html page files. This can be done with a variety of methods including:

List all the files in the bucket and identify page .html
- AWS S3 CLI - use the aws s3 ls command to list all the files in the bucket
- AWS SDK - use the AWS S3 SDK to list all the files in the bucket
Scan the site sitemap.xml file

Scanning the sitemap.xml file

I decided to use the sitemap.xml file method because:

All the data I need is contained in a single sitemap.xml file
All content sites should have an accurate sitemap.xml file. An accurate sitemap is good for search crawlers results and SEO.
I have other uses for scanning the sitemap.xml file - Identify Zombie Routes - Identify routes for other processing

The sitemap.xml file is an XML file that contains a list of all the routes in the site and is typically located at the root of the site and can be accessed by HTTP GET if hosted by a web server, or AWS S3 getObject function if in an S3 Bucket, or using node.js file functions if on a file system. Once the file read, the sitemap.xml file can be parsed to identify all the routes for the site. The routes can then be used with the AWS S3 SDK to identify the .html page objects in an S3 bucket.

@nuxtjs/sitemap

It is recommended that the site sitemap.xml file be generated by the @nuxtjs/sitemap module. It integrates seamlessly with Nuxt.js and NuxtContent and can be configured in the nuxt.config.ts (or nuxt.config.js) file.

Here is the relevant configuration settings I use for Pennock Projects in the nuxt.config.ts. This configuration will automatically generate a sitemap.xml file in the root of the generated site.

export default defineNuxtConfig({
  modules: [
    '@nuxtjs/sitemap',  // 1st in modules list
    //... other modules    
  ],

  nitro: {
    prerender: {
      autoSubfolderIndex: true,
      crawlLinks: true,
      routes: ['/sitemap.xml', '/']
    }
  },

  site: { 
    url: 'https://pennockprojects.com', 
    name: 'Pennock Projects' 
  }, 

  sitemap: {
    strictNuxtContentPaths: true
  },

The nitro.prerender key controls the generation of the static site. The sitemap.xml file is added to the list of routes to prerender. The crawlLinks:true will automatically add all the self-referential links the site to the list of routes to prerender. The autoSubfolderIndex:true will generate 'index' layout file structure for each page (more on that choice later...)

The site key is used by the @nuxtjs/sitemap module to generate the full URL for each route in the sitemap.xml file, by prepending the site.url to each route.

The sitemap.strictNuxtContentPaths:true will ensure that all the NuxtContent paths are included in the sitemap.xml file.

Optimal S3 Object Structure

Double HTML objects

The most compatible arrangement of S3 objects for a S3 hosted static site is to have the page html object duplicated. One page object should be the route name with no .html extension. And the other duplicate should be in an route named folder with an index.html file. For example, the /about url should have a S3 page HTML object at /about and a duplicate page object at /about/index.html. This is the most compatible arrangement for S3 static web hosting, but comes at the cost of duplicating the object and S3 resource storage utilization. In this arrangement, the user can navigate to /about or /about/ and get the same page content. The /about object will be served when the user navigates to /about and the /about/index.html object will be served when the user navigates to /about/. Both objects will have the same content.

Single HTML object

Another good arrangement is to have a single page object with no .html extension. For example, the /about url should have a S3 page HTML object at /about. This arrangement is compatible with S3 static web hosting, but comes at the cost of not being able to navigate to /about/ and get the same page content. The /about object will be served when the user navigates to /about, but navigating to /about/ will result in a 404 Not Found error. This arrangement is good because most users do not use a trailing slash anyway. There is no duplication of objects and S3 resource storage utilization is minimized.

Layout Transformation

The Nuxt static site generation process can generate the page html files in two different layouts.

Nuxt.js Index Layout

The autoSubfolderIndex:true will generate the 'index' layout, which will create a folder for each route and place an index.html file in that folder. For example, the /about url will have a S3 page HTML object at /about/index.html.

Nuxt.js Flat Layout

The autoSubfolderIndex:false will generate the flat layout, which will create a single html file for each route with no folder. For example, the /about url will have a S3 page HTML object at /about.html.

Index to Double

To transform a index layout to a double layout, we need to:

Copy the page object from /<pagename>/index.html to /<pagename>
Keep the original page object at /<pagename>/index.html

For example, a index layout of the S3 objects for /example starts with these objects:

example/
example/index.html
example/_payload.json

and after transformation to double looks like:

example/
example/index.html
example/_payload.json
example

with the example object being a copy of the example/index.html object.

Index to Single

To transform a index layout to a single layout, we need to:

Copy the page object from /<pagename>/index.html to /<pagename>
Remove the original page object at /<pagename>/index.html

It has to be a two step process because there is no 'move' command in S3, so this is accomplished with a copy and remove.

For example, a index layout of the S3 objects for /example starts with these objects:

example/
example/index.html
example/_payload.json

and after transformation to single looks like:

example/
example/_payload.json
example

with the example object being a copy of the example/index.html object and the example/index.html object being removed.

Flat to Double

To transform a flat layout to a double layout, we need to:

Copy the page object from /<pagename>.html to /<pagename>
Copy the page object from /<pagename>.html to /<pagename>/index.html
Remove the original page object at /<pagename>.html

For example, a flat layout of the S3 objects for /example starts with these objects:

example.html
example/
example/_payload.json

and after transformation to double looks like:

example/
example/index.html
example/_payload.json
example

with the example object and the example/index.html object being a copy of the removed example.html object.

Flat to Single

To transform a flat layout to a single layout, we need to:

Copy the page object from /<pagename>.html to /<pagename>
Remove the original page object at /<pagename>.html For example, a flat layout of the S3 objects for /example starts with these objects:

example.html
example/
example/_payload.json

and after transformation to single looks like:

example/
example/_payload.json
example

with the example object being a copy of the removed example.html object.

Where, What, and How

Where in the Pipeline

I am using the Amazon AWS CodePipeline as the Continuous Integration and Continuous Delivery (CICD) system but the fix I am describing should also be useful on other CICD systems like GitHub Actions, GitLab CI/CD, Jenkins, CircleCI, Travis CI, etc.

In the standard AWS CICD Pipeline for S3 deployment, there are three stages: Source, Build, and Deploy. Each stage has a specific function and AWS service.

The key stage in this process is the Build stage, where a virtual EC2 build machine is requisitioned to build the website. The build machine instructions are in the buildspec.yml file. Located in the root of the GitHub repository, the buildspec.yml file has several phases:

install
pre_build
build
post_build

and and the output location in

artifacts

In order to run the fix we need to determine where in the pipeline to run the fix. The fix needs to be run after the build is complete and before the deploy stage. This means we need to use the post_build phase of the buildspec.yml file. The post_build phase is executed after the build phase and before the artifacts. The post_build phase should contain the custom scripts to perform the S3 object transformations.

Copy the output to a S3 bucket, lets call this stage bucket
Run the S3 object transformation scripts on the stage bucket
Deploy the stage bucket to the prod bucket

Stage Bucket

In order for this to work I need to create a new stage bucket for the intermediate step. The stage bucket should have the same policies and web-hosting as the prod bucket.

What Tool should do

I need to develop a new NPM package that can be used in the post_build phase of the buildspec.yml file. The NPM package should be able to:

Read the sitemap.xml file from the stage bucket
Parse the sitemap.xml file to get a list of routes
For each route, determine the S3 object layout (index or flat)
For each route, determine the desired S3 object layout (double or single)
For each route, perform the S3 object transformations to achieve the desired layout

I've created two new projects to accomplish this, you can read about them in the following articles:

How to Apply the Fix

I created package.json jobs for each action I needed during the post_build stage

"scripts": {
  "deploy:build:stage": "aws s3 sync .output/public s3://stage.pennockprojects.com --delete",
  "fix:stage": "npx nuxtss-s3-fix s3://stage.pennockprojects.com --XC",
  "deploy:stage:prod": "aws s3 sync s3://stage.pennockprojects.com s3://pennockprojects.com --delete",
}

then in 'buildspec.yml' I added the following to the post_build phase

phases:
  install:
    commands:
      - npm install      
  build:
    on-failure: ABORT
    commands:
      - npm run generate
      - echo BUILD Check dist!!!
      - ls .output/public
  post_build:
    on-failure: ABORT
    commands:
      - echo -----------DEPLOY build to stage---------
      - npm run deploy:build:stage
      - echo -----------FIX stage---------
      - npm run fix:stage
      - echo -----------DEPLOY stage to prod---------
      - npm run deploy:stage:prod
      - echo -----------DONE---------
      - echo Deploy to Prod completed on `date`

Finally I disabled the AWS CodeDeploy stage and the artifacts directory.

Conclusion

In this article we explored the requirements for a hands-free CICD process for deploying a Nuxt.js static web application to an AWS S3 Bucket and Web Server. We identified the need for a new NPM package to perform the S3 object transformations and created two new projects to accomplish this. We also identified the need for a new stage bucket to perform the intermediate steps. Finally, we added the necessary commands to the buildspec.yml file to perform the fix in the post_build phase.