What It Is

ChessChat lets two players play each other at chess over live video. There are no matchmaking queues or strangers — users invite their friends to play against.

The gameplay model follows video-call session semantics: both players enter a video room that persists for the session, playing as many sequential chess games as they want within the 1-hour room window. Each game uses a 5-minute per-player chess clock. When both players leave, the room tears down automatically — no stale active rooms accumulating AWS Chime costs.

The app requires a range of functionality, and below is an image showing the logic flow of different areas of the application. I'm still learning how best to create diagrams, but separating each flow felt like an easier-to-read reference. In the future, I would have each flow diagram separately as the quality of this large image is low.

ChessChat user flow diagram showing room creation, video session, chess gameplay and session teardown

Demo

This is a functional demo of the core product — video chat and chess gameplay working together in the browser. It's not an infrastructure walkthrough; the AWS architecture is covered in the sections below and in the technical deep-dive.

Architecture Overview

The platform uses a split-host architecture — the static marketing and auth surface (chess-chat.com) runs on CloudFront + S3, while the gameplay runtime (app.chess-chat.com) runs on ECS Fargate behind an ALB. These are deliberately separate delivery mechanisms with different scaling, caching, and security profiles, kept intentionally apart.

chess-chat.com              app.chess-chat.com
(CloudFront + S3)           (ALB → ECS Fargate)
Static landing / auth       Authenticated gameplay runtime
      │                              │
      └──────── Cognito ─────────────┘
                                     │
                      ┌──────────────┼──────────────┐
                      │              │              │
                 ElastiCache     DynamoDB       Chime SDK
                 Redis            (Users,        (Video)
                 (Room/game        Games,
                  state)           Social)

The underlying network is a three-tier VPC across three Availability Zones in us-east-1. Public subnets host the ALB and NAT Gateway. Private app subnets host Fargate tasks with no public IPs. Private data subnets host Redis with zero internet access. Security groups enforce one-directional trust at every layer: internet → ALB → Fargate → Redis.

ChessChat infrastructure diagram showing three-tier VPC, ECS Fargate, ALB, Redis, DynamoDB, CloudFront and Cognito

Key Design Decisions

Each of these was a deliberate tradeoff, documented in Architecture Decision Records in the repo.

Split-Host Architecture

The static marketing and auth surface runs on CloudFront + S3, while the gameplay runtime runs on ECS Fargate behind an ALB. Keeping them separate means independent deployment pipelines and the ALB only handles authenticated gameplay traffic.

Single NAT Gateway

The VPC spans three AZs but outbound traffic routes through one NAT Gateway. So, if that AZ goes down, I have no other pathways for outbound traffic. This does save money though, a NAT gateway is about $35 per month. The Terraform VPC module accepts a nat_gateway_mode variable, so switching to per-AZ NAT is a one-variable change, not a network redesign.

Video Chat Room Lifecycle

Rooms exist only during live participation. A 12-second reconnect grace handles accidental disconnects without triggering teardown. Both participants need to join the "video chat" for Chime to actually start the video chat. Unique room codes exist for all friend pairs, meaning that the room codes are not randomly assigned per room creation, but are locked to the pair of players that played against each other, and will be re-used, when they chose to play again.

Redis + DynamoDB Split

Active room and game state is real time data, and therefore lives in Redis. After a game has been played, the cached game data is written to DynamoDB. DynamoDB receives exactly one write per completed game, regardless of how many moves were played.

Chime SDK → WebRTC Migration

Chime SDK provided a fast path to working video. Plus, I wanted to learn more about this AWS service. The planned migration to native browser WebRTC with a self-hosted Coturn server eliminates per-minute billing in favour of consistent EC2 compute costs.

How It Was Built

The build was designed in a way that that made sense, because some services rely on other services existing already. Basically, each build layer unlocks the next set of services to build. The VPC and networking had to exist before any compute could be placed inside it. ECS and the container environment came next, since the application itself runs there. The chess engine and game logic were built on top of that working compute layer, and the video integration followed once the core session model was stable.

1 · VPC & Networking

Three-tier VPC across three AZs — public, private app, and private data subnets. Route tables, security groups, NAT gateway, VPC endpoints for ECR, CloudWatch, Secrets Manager, and DynamoDB.

2 · Data Layer

DynamoDB tables for users, games, and the social graph. ElastiCache Redis cluster with multi-AZ replication.

3 · Auth

Cognito user pool with email/password and Google OAuth. The auth surface (chess-chat.com) is a custom single-page website. It handles login & signup, email verification, forgot password, OAuth callback, and session handoff. JWT tokens used for both REST and WebSocket authentication.

4 · Compute & Delivery

ECS Fargate service behind an ALB. CloudFront + S3 for the static marketing and auth site. Route 53 for DNS.

5 · CI/CD

GitHub Actions pipeline. A single deploy pipeline on push to main: ECR image tagged by commit SHA, ECS rolling deploy, then static auth assets synced to S3 with CloudFront invalidation. Tests (backend, frontend, Terraform) run as required PR gate checks — not in the deploy pipeline. Branch protection enforces all three checks on main with no admin bypass.

6 · Chess Engine

WebSocket server handling room creation, game state, and real-time move broadcast. Client-side validation with chess.js; server-side validation with chess.js as the authoritative source of truth.

7 · Video

Chime SDK integration for managed WebRTC. The server creates a meeting on room activation and issues attendee tokens to both players. Video runs independently of the chess session — a disconnect from one does not affect the other.


Technical deep-dive → View on GitHub ↗ ← All projects