Monitor all requests to CloudFront static site on S3 by adding an OAI

Measure All the Things

Metrics has been a popular concept, and word, for a few years now and justifiably so. Any organization, from businesses to open source software projects, need a way to assess their productivity, outreach, and general effectiveness at what they’re trying to accomplish. Influential Code is no different; we need to know if we’re reaching an audience in order to provide them with a service we hope they’ll value. One way we do this is by reviewing the requests that our site is receiving.

Adding an Origin Access Identity to Cloudfront FTW

To do that, we rely on Cloudfront logs to help us examine our traffic. If you’re using S3 with Cloudfront to host your site, it is possible that not all traffic is recorded and you could be missing requests. To prevent that, you can add an origin access identity (OAI) to a distribution and allow Cloudfront to access your S3 bucket, which provides the site to your viewers, but prevents users from directly accessing the bucket. AWS provides excellent detailed documentation on how to accomplish this so I recommend reviewing that after you finish this post. Given the constant evolution of AWS, I think that it’s always best to consult their documentation so that you are receiving the latest information.

Real Benefits of Using Cloudfront

Enabling an origin access identity provides two tangible benefits and one less tangible benefit, all of which I’ll discuss below. First, when you serve a static site from S3 (without Cloudfront), you must allow public read access to the objects that are part of the site. Enabling public access to a bucket is problematic because it’s possible to apply the wrong the permissions and allow for unintended actions. For instance, you may accidentally enable write access or directory listing, neither of which you intended to do.

Next, and aside from mistakes, by forcing all requests through Cloudfront, you can take advantage of the benefits provided by this service. Among other benefits, you are provided with a more detailed view of the requests that you’re receiving compared to the view provided by S3 logs alone. This includes user device information, popular requests, and the geography of the requests, to name a few. You can also enable alarms based on preset criteria such as the number of a requests within a defined period of time.

wp-login.php on an S3 Bucket?

As I mentioned, there’s a less tangible benefit as well. Recently, we were reviewing our logs and noticed a number of interesting requests. Namely, wp-login.php (WordPress login page), .ssh/id_rsa (SSH private key), and a number of other requests for objects that we don’t host and will likely never host such as /.svn/wc.db. As we reviewed these, it was comforting to know that we didn’t have to worry about WordPress logins or keys because the site is hosted in an S3 bucket and not on a web server that we maintain, which would contain SSH keys among other sensitive files.

While knowing about these requests won’t change our content or directly help this site, it was an important reminder that regardless of your site’s size, you will receive malicious traffic. When I mentioned this to a former colleague, they made a valid point when they said (paraphrased), “I mean, I look at it this way, it’s free to run a script that scans the Internet”.