Nuxt Static Site S3 Fix

Tools and Gear around a website with Nuxt and S3 bucket logos.

The nuxtss-s3-fix is a CICD tool for optimizing Nuxt.js Static Sites HTML page objects to work with the Amazon AWS S3 bucket static web site hosting. These optimizations improve:

Social Media sharing shows article images and meta data.
SEO search crawlers record the page meta data
Direct page loading times

In my testing, I built my sites using the using Nuxt.js flat static layout, then ran the commands that the tool provided which arranged the HTML page objects into a same as directory layout. After applying the fix, I verified that social media sharing and SEO crawlers worked as expected. I found that direct page loading was faster. No observed negative effects of this arrangement were found, but your mileage may vary.

The tool can be used in a local CLI environment or in a CICD pipeline script. It generates AWS S3 CLI commands that you can run in your AWS S3 CLI context. The tool does not make any changes to your S3 bucket, it only generates the commands for you to run.

Prerequisites

In order for the tool to produce appropriate AWS CLI S3 commands you must provide the following:

An AWS S3 bucket containing all the objects as built by Nuxt Static Site generator using npm run generate and autoSubfolderIndex setting in the nuxt.config.js file.
- autoSubfolderIndex: true = index
- autoSubfolderIndex: false = flat
By default, it expects an accurate sitemap.xml object describing the page HTML objects in the bucket root. If you use a sitemap Nuxt.js module, such as @nuxtjs/sitemap, this file gets generated automatically.
- Optionally, you can specific a different sitemap.xml file, using the --sitemap-file option. This file can be a local file, an https: file, or in another S3 bucket.
The tool needs an inherited AWS S3 CLI context with thegetObject permission for the site S3 bucket.
- important nuxtss-s3-fix exclusively uses the AWS S3 CLI context in its execution space and does not store or request AWS S3 permissions
- If you are running the tool in a CICD pipeline action, ensure that the action has the appropriate AWS S3 context permissions.

Static Site HTML Page Layouts

When building a Nuxt Static Site for AWS S3 bucket hosting, each page (we'll use example) HTML object will be in one of two layout arrangements depending on the autoSubfolderIndex setting in the nuxt.config.js file.

In the flat layout (autoSubfolderIndex: false) the HTML page object has a .html extension and is a peer next to a directory with the page name.
- example/
- example/_payload.json
- example.html - flat HTML page object
In the index layout (autoSubfolderIndex: true) the HTML page object is placed in the <page> directory with the name index.html page generates files and bucket objects:
- example/
- example/_payload.json
- example/index.html - index HTML page object
In the same as directory layout, which is the ideal arrangement for S3 static web site hosting, the HTML page object has no extension and is named the same as the directory object. This is only legal within an AWS S3 bucket, where a directory and an object have the same name, but this not allowed in most file system. The ideal arrangement of the S3 bucket objects for each page looks like this:
- example/
- example/_payload.json
- example - same as directory HTML page object

Flat or Index Problems

When using a Nuxt static site on an Amazon S3 bucket, the flat and index arrangements have the following issues:

Social Media sharing of a page URL does not show the article image or meta data.
SEO search crawlers do not record the page meta data.
Direct page loading times are slower than they could be.

More details about this can be found at: AWS S3 Configuring an index document and Blog on S3 Static Hosting Issues

S3 Web Server Quirks

The key points of the S3 Web Server quirky behavior are:

For each page URL request with a trailing slash, i.e. photo/, the AWS S3 Web Server will return 1. 200 OK if s3://bucket/photos/index.html exists 2. 404 Not Found if not found
For each page URL request without a tailing slash, i.e. photo, the AWS S3 Web Server will return 1. 200 OK if s3://bucket/photos exists 2. 302 Temporarily Moved if s3://bucket/photos/index.html exists (it will redirect to the URL with the trailing slash, i.e. http://example.com/photos/ which causes a reload) 3. 404 Not Found if not found

Once the Nuxt.js web app is loaded from the home page, the Vue.js router will handle any further navigation within the app and update the URL directly. This means that if a user starts at the home page http://example.com and then clicks on a link to the photo page, the URL will be updated to http://example.com/photo and the Nuxt.js app knows how to load that page HTML and content without interacting with the S3 site web hosting quirkiness. It works as expected.

However, when the page URL is directly requested, i.e. http://example.com/photo or http://example.com/photo/, the S3 web server quirkiness comes into play. The page HTML object has all the information it needs to load the Nuxt.js web app and display the proper page information, but the S3 web server has to deliver that page HTML object first. This is where the problems with the flat and index layouts arise.

Flat Direct Load Issues

When the Nuxt static site is built using the flat layout, each S3 bucket HTML page object is named <page>.html. This means than any URL requests (except home) which doesn't include the .html extension, regardless of the trailing slash, will get a 404. If the URL request includes the .html extension, it will work, but that is not a user friendly URL.

For example:

If a photo URL request arrives, the AWS S3 Web Server will not find a photo HTML object or a photo/index.html object and return a 404 Not Found.
If photo/ URL requests arrives, the AWS S3 Web Server will not find a photo/index.html HTML object and return a 404 Not Found.
If a photo.html URL request arrives, the AWS S3 Web Server will find the photo.html HTML object and return a 200 OK.

Index Direct Load Issues

When the Nuxt static site is built using the index layout, each S3 bucket HTML page object is named <page>/index.html. If the URL request does not include the trailing slash, it will work, but will receive a 302 Temporarily moved reload to the page with the trailing slash. If the URL request includes the trailing slash will receive a 200 OK.

For example:

If a photo URL request arrives, the AWS S3 Web Server will not find a photo HTML object, but will find the photo/index.html object and return a 302 Temporarily Moved to photo/.
If photo/ URL requests arrives, the AWS S3 Web Server will find the photo/index.html HTML object and return a 200 OK.

Generally the index layout is better than the flat layout, as the Nuxt app will generally load, but the 302 redirect is not ideal and makes your site less useful to users for sharing and can slow down the page load time for the redirect.

Same As Directory Direct Load

When the Nuxt static site is optimized by the nuxt-ss-fix tool into the same as directory layout, each S3 bucket HTML page object is named <page>. If the URL request does not include the trailing slash it will respond with 200 OK. If a URL request includes the trailing slash, it will get a 404 Not Found, but that is a logical response.

For example:

If a photo URL request arrives, the AWS S3 Web Server will find a photo HTML object and return a 200 OK.
If photo/ URL requests arrives, the AWS S3 Web Server will not find a photo/index.html HTML object and return a 404 Not Found.

The natural user expectation is that photo is the page and photo/ is a directory, and the 404 Not Found response is logical.

Avoiding 404 Not Found for URL with trailing slash

If you want to avoid the 404 Not Found response for URL with slashes in the Same As Directory layout, you can duplicate the HTML page object to exist at both photo and photo/index.html. This seems overkill, in my opinion, but if need this or prefer, the procedure to do this is:

Generate an index Nuxt.js web app and deploy to your bucket. This will contain <page>/index.html HTML bucket objects.
Run the tool just once for the copy commands to copy them to the <page> HTML bucket object and never remove the <page>/index.html HTML bucket objects.

This way either URL request will work, but you are duplicating each HTML page object.

Tool Operation

The nuxtss-s3-fix tool will generate commands to copy HTML Page objects into the Same as Directory layout and generate commands to remove duplicate Index or Flat HTML Page objects.

The sequence that the tool was designed to operate in is:

Deploy a Nuxt.js static site (either generated with the Index (recommended) or Flat layout) to an AWS S3 bucket
Run the tool to generate the copy commands.
Execute the copy commands in your AWS S3 CLI context.
Verify that the site works as expected.
Run the tool to generate the remove commands.
Execute the remove commands in your AWS S3 CLI context.
Verify that the site works as expected.

The tool will not generate a copy command for a page if a Same as Directory HTML page object already exists. This allows you to re-run the tool to generate copy commands for any pages that may have failed to copy in a previous run. For example, if you have a large number of pages to copy and your network connection is unstable, you can run the tool again to generate copy commands for only the pages that were not copied.

The tool will not generate a remove command for the Index or Flat HTML pages unless the Same as Directory HTML page object already exists. This allows you to re-run the tool to generate remove commands for any pages that may have failed to be removed. For safety, the tool will not generate remove commands unless HTML page duplication exists. If only one copy of the page exists, it will not be removed.

This means that you can run the tool multiple times to ensure that all pages are copied and removed successfully.

For example, if you have 100 pages to copy, you run the tool to generate the copy commands and execute them. You verify that the site works as expected. You then run the tool again to generate the remove commands and execute them. If there were any errors in the remove commands, you can re-run the tool to generate remove commands for only the pages that were not removed.

The tool will read the sitemap.xml file in the root of the S3 bucket to determine the paths that need to be fixed. It will then check for the existence of the current page HTML object and the new <page> object. Based on this information it will generate the appropriate AWS S3 CLI commands to perform the copy and remove operations.

Copy Process

The Copy process will generate copy commands for either Index <page>/index.html or Flat <page>.html S3 page objects to Same as Directory S3 page object <page>.

It will only generate copy commands for paths found in the sitemap.xml that correspond to a Nuxt bucket object in the flat or index arrangements and there is no existing object at the new <page> location. It skips paths that do not have a corresponding HTML object in the S3 bucket. This means that if you have a path in the sitemap.xml that is not a page, such as an image or other asset, no copy command will be generated.

The Copy Process will only generate copy commands for paths found in the sitemap.xml that correspond to a Nuxt bucket object in the flat or index arrangements when there is not an existing object at the new <page> location.

If there were errors when executing the copy commands, you can rerun the tool and generate a new set of copy commands that only apply to the ones that were not copied.

Remove Process

The Remove Process will only generate remove commands for

paths found in the sitemap.xml correspond to a Nuxt bucket object in the flat or index arrangements
when there is an existing object at the new <page> location.

If there were errors when executing the remove commands, you can re-run the tool and generate a new set of remove commands that only apply to the ones that were not removed.

Flat Example

The new arrangement of the S3 bucket objects uses the feature of Amazon S3 objects where a directoy object and a file object can have the same name at the same level. The ideal arrangment of bucket objects for each page looks like this:

example
example/
example/_payload.json

Flat Site commands

The AWS S3 CLI commands created for each flat page

copy command: aws s3 cp s3://<bucket>/example.html s3://bucket/example
delete command: aws s3 rm s3://<bucket>/example.html

Index in Folder Commands

The AWS S3 CLI commands created for each 'Index in Folder' page

copy command: aws s3 cp s3://<bucket>/example/index.html s3://bucket/example
delete command: aws s3 rm s3://<bucket>/example/index.html

Delete considerations

Using the tools copy commands only, the Nuxt static site will operate normally. The existing .html file does not cause any problems that I have found. But you are duplicating each page HTML file.

The delete commands are created to remove the original page .html that was copied and only should be removed if the copy was successful.

Program Requirements

Input S3 bucket location
AWS S3 getObject permission context with to the S3 bucket
sitemap.xml in root of bucket.
CLI tool for command-line or from CICD script
For Nuxt Static Flat site <page>.html => <page>
For Nuxt Static Subfolder site <page>\<page>.html = <page>
Option for removing the original .html file
Validate each sitemap path entry to a distinct <page>.html file
delete command only generated if there is a new <page> object

Nuxt Static Site S3 Fix

Prerequisites

Static Site HTML Page Layouts

Flat or Index Problems

S3 Web Server Quirks

Router Navigation vs Direct Load

Flat Direct Load Issues

Index Direct Load Issues

Same As Directory Direct Load

Avoiding 404 Not Found for URL with trailing slash

Tool Operation

Copy Process

Remove Process

Flat Example

Flat Site commands

Index in Folder Commands

Delete considerations

Program Requirements