Website Design Architecture


In my previous post, I mentioned I was hoping to write up a post about how this website is architected. Actually following through on this hope, here is my first real content post on this site! (I apologize in advance for how long this post got…)

I wanted to make a site which I was able to easily update, and could do so from my phone or tablet. This is a bold requirement for someone who has never had a content blog before, but I wanted a challenge, so onward I went. I also wanted to have a highly available website, that would stay up to date between both without any intervention on my part. I have more power outages at my house than seems reasonable, and did not want my web presence to disappear since I have not forked over the cash for a home generator yet.

When trying to do something new and exciting, why not make the task even more difficult by also running this whole apparatus in Kubernetes. I do really like making my life hard, don’t ask why. I have a multi-node, multi-architecture cluster running in my house, as well as a single node “cluster” running on Oracle cloud with their free forever plan. Another part of Oracle’s free plan that I still don’t understand is that it includes site-to-site VPN tunnels, so I have my cloud cluster connected back home via VPN. This will be important later, I swear.

I have been using Rancher at home on it’s own virtual machine to manage my two Kubernetes clusters in a nice easy fashion, and have been a big fan of their built in Continuous Delivery system. Under the hood, it is using their fleet software to handle the git syncing and keeping the clusters in sync. I figured this would be a great way to keep the two clusters in sync and ensure the services are running properly in both sites. I can connect to my Oracle cluster via the VPN to keep access to the cluster locked down to the internet.

I had previously experimented with using Hugo for static site generation, which I wanted to use for this case as trying to have dynamic content sync across the world did not seem fun. Oh that around the world bit? My Oracle node is in Melbourne, Australia (for reasons), so syncing between the two sides can take a bit, but it is also about as far away as I could get from my home in the Northeast US, so if anything it’s another fun challenge.

Back to Hugo, I was lucky enough to come across this post a while ago, which details how to sync a git repo into a Kubernetes pod using a git-sync container, a shared filesystem, and a busybox container running the latest hugo binary, serving the content from the shared filesystem. This configuration satisfies my requirement to simply update a Git repo, and have the site auto-update. (Or was it what made me add this requirement? I can’t remember anymore.) Either way, this works out very well, and a change to my git repo will show up on the site usually in 30 seconds or less. Pretty impressive I would say.

This deployment alone would work in a single cluster, but I wanted it distributed to at least two sites (I think any number of sites in my design would work equally as well). I have been messing around with Cloudflare and their tunnel solution to reverse-proxy connections into a private server. Their blog has many good articles on their tunnels, but this is a good starting point: https://blog.cloudflare.com/ridiculously-easy-to-use-tunnels/. The basics of this technology for my use is to have a pod in each cluster establish outbound connections to Cloudflare’s data centers. My cluster at home goes to two US data centers, the Oracle cluster in Australia goes to data centers in Melbourne and Sydney. This allows me to recieve web traffic without exposing my public IPs or even opening any inbound ports to my clusters at all.

Since I am using Rancher CD to keep the two clusters in sync, it is trivial to ensure things like pod and service names match between the two sites. This allowed me to configure my reverse tunnels on Cloudflare such that my public hostname (altheashaheen.com) could point to the same destination (homepage.default.svc.cluster.local), and regardless of which cluster the traffic from a client was sent to by Cloudflare, the cluster receiving the traffic would be able to respond.

With these two parts, I am now able to receive traffic to my website, and Cloudflare will route it into a tunnel based on location. This traffic will reverse proxy through the cloudflared container in my cluster, and go to my hugo webserver container. Since they both sync via Github, the content on both clusters should always match, so the experience for the user will be the same regardless of which cluster they get sent to (except maybe load times if they are sent to the one further away geographically).

This would all work fine, but I felt like going a step further, and serving up images from my own S3 buckets, which are also running in my clusters. For this, I am using Minio in standalone mode, for a free solution for storage. Oh did I mention, this whole setup is free, besides the cost of my domain name? Anyway, I decided to use Minio since I was already using it, so the choice was easy. I wanted it to be highly available as well, however, so that my bucket for web media would be in sync between the two sites.

Since I am using the standalone version of Minio (for resource use reasons mainly), I cannot turn on versioning, which is a requirement for replication. Needing another way to keep my buckets in sync, I discovered that the management command mc has the ability to sync one bucket to another. It also allows you to add the --watch flag to continuously sync. Since I am already all in on Kubernetes, I set up a simple deployment in my home cluster which can authenticate to both S3 instances, and will sync any new content that I put in to my home bucket down to Oracle via my VPN. This one way sync was good enough for me, though I would like to have some way to make multi-server sync work eventually. Like the git sync, it is surprisingly instantaneous.

To ensure my site can reach this content in either cluster, I am using the subdomain sites.altheashaheen.com and our old pal Cloudflare tunnels to route the traffic into my cluster. Inside each cluster, again kept in sync via Rancher CD, is an nginx-s3-gateway container which is simply a front end to the S3 bucket for my web content. I can then link to objects in the bucket using my public hostname, and have it serve on my site from either site still. An example being this here architecture diagram: over elaborate website archieture diagram from someone with too much time on their hands

I hope this blog post of a brain dump makes some amount of sense to other people as well, and that maybe just maybe it will help out another curious mind looking to set up a static website that can be scaled dynamically across data centers and kept in sync in almost real time, with a very easy to manage content updating system and distributed object storage for media.