AWS CLI CloudFront Invalidation For An S3 Static Site

I recently deployed the new version of this site; it’s being hosted from an AWS S3 bucket with CloudFront sitting in front as a CDN. By default objects are cached by CloudFront for up to 24 hours…I wanted to invalidate that cache when I publish a new post so that the site is up to date immediately.

I used to host this blog using GitHub Pages which does this for you and is a lot easier to set up compared to AWS. Word to the wise! The only reason I decided not to do the same this time was to avoid having my site on two domains (my own + sitename.github.io) which I found confusing. GitHub Pages is a great solution though if you want to avoid the AWS faff.

I did come across some reasons not to invalidate the cache, and it’s worth noting there is a cost implication to these invalidations if you’re doing more than 1000 per month. This doesn’t apply to my use case though, so I went with invalidations.

There are some more considerations/warnings at the bottom of this post!

The Goal

I have a command line script which commits a new post, pushes the repo to GitHub, generates updated static files, and updates the static site repo*; in turn AWS CodePipeline watches for changes to the static repo and updates the S3 bucket, which is then shipped out (potentially) to CloudFront’s edge locations.

To view the updates immediately, now I need to add a command to invalidate the CloudFront cache (i.e. remove old versions of the files). The final result should be something like this:

# add today's blog post & deploy to site
function atp { 
  git add -A
  git commit -m "Add today's post"
  git push
  hexo generate -deploy
  aws invalidateThatCache
}

I don’t have the AWS CLI though and have no idea how to do this, so let’s figure it out…!

Install AWS CLI

I’m on a Mac so I chose Homebrew (see docs for other methods):

$ brew install awscli

When it was done, confirm the installation:

$ aws --version
aws-cli/2.0.25 Python/3.8.3 Darwin/18.7.0 botocore/2.0.0dev29

Great!

Set Up A New IAM User

In AWS you create different users who each have different permissions. So I’ll create a new user which is authorized to use CloudFront from my command line. This guide helped me with these steps.

Open the IAM service in the AWS web console
Click the ‘Add user’ button and give the user a name like nia-cli
For ‘Access type’ check Programmatic access then click Next
On the next screen you set this user’s permissions. I already had ad ‘admin’ user group set up and used this, but this is where you could restrict the user to specific services if you want. Click Next
Skip the Tags section and click Next: Review
Click Create user. Download the credentials CSV if you want to save it, otherwise just keep this tab open to reference this new Access key ID and Secret access key
Go to the command line and enter aws configure
Copy/paste the access key ID and secret access key when prompted
For Default region name I chose us-east-1 because this is where most of my AWS resources are. If you don’t already use a lot of services choose the location nearest you, here’s a list
For Default output format I chose json

Done! AWS CLI is now set up. I can confirm this worked by checking my CloudFront distributions (for verbose results remove the --query and filter):

$ aws cloudfront list-distributions --query 'DistributionList.Items[*].{Id:Id,Aliases:Aliases.Items[*]}'

NOTE: Later I removed this new user from the ‘admin’ group and created a new group which only has permission to interact with CloudFront and S3. Feels safer!

Set Up The Invalidation

This article is a good reference for a lot of the below. The final command I need for my particular use case is:

$ aws cloudfront create-invalidation --distribution-id UDV02B3N97S14 --paths /content.json /feed.xml /code/ /posts/ /categories/code/

To break this down, when I run the filtered list-distributions command above, I get a list of my current CloudFront distributions:

[
  {
    "Id": "UDV02B3N97S14",
    "Aliases": [
        "www.niamurrell.com",
        "niamurrell.com"
    ]
  },
  {
    "Id": "ELM28VDZF4JNM9",
    "Aliases": [
        "dev.niamurrell.com"
    ]
  }
]

This is where the --distribution-id comes from. The arguments after the --paths tag are each of the paths that I want to be invalidated, separated by spaces.

The article linked above discusses how you can list the paths in a separate file if you have a lot. You could also set the path as "/*" to invalidate the whole directory. AWS charges per invalidation path so it might be more cost-effective to invalidate the whole directory each time if a lot of paths need to be invalidated.

And that’s it! I added the invalidation command to my deploy script and job done.

Did It Work?

I happened to have the CloudFront web console open and indeed, a new invalidation popped up on the website as soon as I ran the command. It’s also possible to confirm programmatically if you’re so inclined.

Considerations

Timing

Although each of the invalidations I’ve run (all 4 of them!) have completed within a few seconds, there is no way to control the time it will take to invalidate the cache. There is also an unknown time factor with CodePipeline updating S3 from GitHub. But my script deploys the site and starts the invalidation in one go.

All that to say, it’s possible the invalidation could finish really fast and the code is pushed afterwards, and then you’ve invalidated for nothing! For this reason, the article recommends setting up this invalidation as its own Lambda function, and adding it as a pipeline stage in CodePipeline.

I’ll give that a shot if I run into problems with the way I’ve set it up.

Cost

I mentioned this above but just to reiterate—S3 updates, CodePipeline pipelines, and CloudFront invalidations all have the potential to be not-insignificantly chargeable at scale. My site is really small but anyone using these tools on bigger sites should consider the costs to avoid surprising bills 😄

On that note, protect the AWS credentials too!

* Update 5 days later: Once I got CloudFront working with the AWS CLI, it was pretty easy to get acclimated with S3 commands as well. I deleted the static site repo and took CodePipeline out of this workflow…now I aws s3 sync the generated files directly with the S3 bucket instead.