Sindika - Building Tomorrow's Digital Infrastructure

The Kubernetes industrial complex has convinced the industry that every production workload needs an orchestrator, a service mesh, and a dedicated platform team. But here's a secret many successful companies won't admit publicly: Docker Compose runs their production systems just fine.

Not every system needs auto-scaling across availability zones. Some need to reliably run 5-10 containers on a single VPS and restart them when they crash. For that, Docker Compose isn't a compromise — it's the right tool. Less complexity means fewer failure modes, faster debugging, and deployments any developer can manage without a DevOps certification.

“We run production systems on Docker Compose for clients who don't have a dedicated DevOps team. Zero Kubernetes complexity. One SSH command to deploy. 99.9% uptime on a $20/month VPS.”

— Sindika DevOps

Chapter 1: When Docker Compose Makes Sense

Docker Compose is production-ready when your system meets these conditions: you run on 1-3 servers, you have fewer than 20 containers, your scaling needs are predictable, and you don't need multi-region failover. That describes a surprising number of real-world systems.

✅ Ideal Use Cases for Docker Compose in Production

✓Internal tools and dashboards — admin panels, reporting systems, and internal APIs used by 10-200 employees. Traffic is predictable and bounded.
✓B2B SaaS with known user counts — platforms serving hundreds of businesses, not millions of consumers. Your daily active users fit in one database.
✓IoT data collection endpoints — edge gateways that receive sensor data, process it, and forward to a data lake. Stateless, restartable.
✓Staging and QA environments — mirror production topology without the Kubernetes overhead. Perfect for integration testing.
✓Background processing systems — document processors, PDF generators, sync engines. They process work queues, not user-facing requests.

Docker Compose vs Kubernetes — Honest Comparison

Dimension	Docker Compose	Kubernetes
Setup Time	✓ 30 minutes	Days to weeks
Min RAM Required	✓ 256 MB	2+ GB (control plane)
Rolling Updates	Near-zero (--no-deps)	✓ Native rolling
Auto-Scaling	Manual	✓ HPA / VPA
Service Discovery	DNS (built-in)	CoreDNS + Services
Health Checks	Built-in	✓ Liveness + Readiness
Multi-Node	No (single host)	✓ Yes (cluster)
Learning Curve	✓ Low	Steep
Debugging	✓ docker logs + exec	kubectl + complex
Ops Team Required	✓ No	Usually yes

The decision framework: if you need multi-node scheduling, auto-scaling, or service mesh — use Kubernetes. If you need to reliably run containers on 1-3 servers with health checks, restart policies, and easy debugging — Docker Compose is not just adequate; it's better.

Chapter 2: The Production Stack Architecture

A typical production Compose stack runs behind an Nginx reverse proxy that handles SSL termination, rate limiting, and static file serving. Behind that, your application containers talk to data services (PostgreSQL, Redis, MinIO) over a private Docker network that's invisible to the outside world.

Only Nginx is exposed to the internet. All other services communicate over an internal Docker bridge network — zero external attack surface.

Chapter 3: The Production-Grade Compose File

A production Compose file is not your development docker-compose.yml with the volume mounts removed. It needs health checks, restart policies, resource limits, logging configuration, proper networking, and dependency ordering.

# docker-compose.prod.yml — battle-tested patterns

services:
  # ─── Reverse Proxy ───
  nginx:
    image: nginx:1.27-alpine
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
      - static-files:/var/www/static:ro
    depends_on:
      app:
        condition: service_healthy
    deploy:
      resources:
        limits:
          memory: 128M
          cpus: "0.5"
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  # ─── Application ───
  app:
    image: registry.example.com/app:${TAG:-latest}
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    environment:
      - ASPNETCORE_ENVIRONMENT=Production
      - ConnectionStrings__Default=${DB_CONNECTION}
      - Redis__Connection=redis:6379
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"

  # ─── Background Worker ───
  worker:
    image: registry.example.com/worker:${TAG:-latest}
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 256M
          cpus: "0.5"
    environment:
      - ConnectionStrings__Default=${DB_CONNECTION}
      - Redis__Connection=redis:6379
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy

  # ─── Database ───
  db:
    image: postgres:16-alpine
    restart: unless-stopped
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: ${DB_NAME}
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USER}"]
      interval: 10s
      timeout: 5s
      retries: 5
    deploy:
      resources:
        limits:
          memory: 1G
          cpus: "1.5"
    shm_size: 256m    # PostgreSQL needs shared memory

  # ─── Cache & Queue ───
  redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: redis-server --appendonly yes --maxmemory 128mb --maxmemory-policy allkeys-lru
    volumes:
      - redisdata:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3
    deploy:
      resources:
        limits:
          memory: 192M

volumes:
  pgdata:
  redisdata:
  static-files:

✅ Production Compose Essentials

✓restart: unless-stopped — containers restart automatically after crashes or server reboots. The simplest and most effective self-healing mechanism.
✓Health checks on every service — dependent services wait until dependencies are healthy. No more “app started before DB was ready” race conditions.
✓Resource limits — cap memory and CPU per container. A memory leak in one service shouldn't bring down the entire server via OOM killer.
✓Log rotation — without limits, Docker logs will fill your disk. Set max-size (10m) and max-file (3-5) on every service.
✓Named volumes for data — use named volumes for databases and persistent storage. Never use bind mounts for production data — they bypass Docker's volume management.
✓Environment variables via .env — never hardcode secrets in the compose file. Use a .env file (excluded from Git) or Docker secrets.

Chapter 4: Near-Zero-Downtime Deployments

The biggest objection to Compose in production is “you can't do rolling deployments.” True — Compose doesn't have native rolling updates like Kubernetes. But you can achieve near-zero-downtime deployments with a simple three-step pattern.

The --no-deps flag is the key: recreate only the app container without touching its dependencies. Total downtime: under 2 seconds.

#!/bin/bash
# deploy.sh — near-zero-downtime deployment script

set -euo pipefail

TAG=${1:-latest}
COMPOSE_FILE="docker-compose.prod.yml"

echo "🚀 Deploying version: $TAG"

# 1. Pull the new image (download happens while old is still running)
echo "📥 Pulling new image..."
TAG=$TAG docker compose -f $COMPOSE_FILE pull app worker

# 2. Recreate ONLY the app + worker containers (DB stays up)
echo "🔄 Recreating services..."
TAG=$TAG docker compose -f $COMPOSE_FILE up -d --no-deps app worker

# 3. Wait for health check to pass
echo "❤️ Waiting for health check..."
TIMEOUT=60
ELAPSED=0
until docker inspect --format='{{.State.Health.Status}}' $(docker compose -f $COMPOSE_FILE ps -q app) 2>/dev/null | grep -q "healthy"; do
    sleep 2
    ELAPSED=$((ELAPSED + 2))
    if [ $ELAPSED -ge $TIMEOUT ]; then
        echo "❌ Health check timeout! Rolling back..."
        TAG=$PREVIOUS_TAG docker compose -f $COMPOSE_FILE up -d --no-deps app worker
        exit 1
    fi
done

# 4. Clean up old images
echo "🧹 Cleaning up..."
docker image prune -f

echo "✅ Deployment complete — healthy in ${ELAPSED}s"
echo "   Downtime window: ~1-2 seconds"

The downtime window is typically under 2 seconds — the time between stopping the old container and the new one opening its listening port. For most internal tools and B2B platforms, users don't even notice. If 2 seconds is too much, add a second app replica behind Nginx and deploy them one at a time.

Chapter 5: Security Hardening

“Docker Compose isn't secure enough for production” is a myth — but only if you actually harden it. Default Docker settings are not secure. Here's what a hardened Compose deployment looks like, layer by layer.

Defense in depth: each layer prevents a different attack vector. No single layer is sufficient — you need all four.

# Security-hardened container configuration
services:
  app:
    image: registry.example.com/app:${TAG}
    read_only: true                    # Read-only root filesystem
    tmpfs:
      - /tmp:size=64M                  # Writable /tmp (limited)
      - /var/run:size=8M               # For PID files
    security_opt:
      - no-new-privileges:true         # Prevent privilege escalation
    cap_drop:
      - ALL                            # Drop all Linux capabilities
    cap_add:
      - NET_BIND_SERVICE               # Only what's needed
    user: "1000:1000"                  # Non-root user

# Firewall rules (run once on server setup)
# ufw default deny incoming
# ufw default allow outgoing
# ufw allow 22/tcp    # SSH
# ufw allow 80/tcp    # HTTP (redirect to HTTPS)
# ufw allow 443/tcp   # HTTPS
# ufw enable

# fail2ban for SSH + Nginx brute-force protection
# apt install fail2ban
# systemctl enable fail2ban

🤔 Security Mistakes We See Constantly

▸Exposing database ports — PostgreSQL on port 5432, wide open to the internet. Never expose data service ports. Use Docker internal networking only.
▸Running as root — default Docker runs containers as root. One container escape = full server access. Always use user: "1000:1000".
▸Passwords in docker-compose.yml — committed to Git for the whole team (and GitHub) to see. Use .env files, excluded via .gitignore.
▸No log rotation — Docker logs grow unbounded by default. A busy API fills a 50GB disk in days. Set max-size and max-file on every service.

Chapter 6: Backup Strategy

Named volumes make your data survive container recreations, but they don't protect against disk failure, accidental deletion, or ransomware. You need a real backup strategy — and 3-2-1 is the gold standard.

#!/bin/bash
# backup.sh — automated daily backup with off-site sync

set -euo pipefail

BACKUP_DIR="/backups/$(date +%Y-%m-%d)"
REMOTE="backup-server:/offsite-backups/"

mkdir -p "$BACKUP_DIR"

# 1. PostgreSQL dump (consistent snapshot)
echo "💾 Backing up PostgreSQL..."
docker compose exec -T db pg_dump -U $DB_USER -Fc $DB_NAME > "$BACKUP_DIR/db.dump"

# 2. Redis snapshot
echo "💾 Backing up Redis..."
docker compose exec -T redis redis-cli BGSAVE
sleep 5
docker cp $(docker compose ps -q redis):/data/dump.rdb "$BACKUP_DIR/redis.rdb"

# 3. Application uploads / files
echo "💾 Backing up volumes..."
docker run --rm -v app-uploads:/data -v $BACKUP_DIR:/backup \
    alpine tar czf /backup/uploads.tar.gz -C /data .

# 4. Sync to off-site storage
echo "☁️ Syncing to off-site..."
rsync -avz --delete "$BACKUP_DIR/" "$REMOTE/$(date +%Y-%m-%d)/"

# 5. Clean up old local backups (keep 7 days)
find /backups -maxdepth 1 -type d -mtime +7 -exec rm -rf {} \;

echo "✅ Backup complete: $BACKUP_DIR"

# Run via cron: 0 3 * * * /opt/scripts/backup.sh >> /var/log/backup.log 2>&1

The most important rule of backups: test your restores. A backup you've never restored from is not a backup — it's hope. Run a monthly restore drill to a test environment. Time it. Document it. The day you need it, you'll be grateful.

Chapter 7: Lightweight Monitoring

You don't need Datadog or New Relic for a single-server Compose deployment. A lightweight stack of Promtail → Loki → Grafana gives you centralized logging, metrics dashboards, and alerting — all running as Docker containers alongside your application, using less than 500MB of RAM.

The full monitoring stack runs alongside your app in the same Compose file. No external SaaS costs. No data leaving your server.

# Add to docker-compose.prod.yml — monitoring sidecar

  # ─── Monitoring ───
  loki:
    image: grafana/loki:2.9.4
    restart: unless-stopped
    volumes:
      - loki-data:/loki
    deploy:
      resources:
        limits:
          memory: 256M

  promtail:
    image: grafana/promtail:2.9.4
    restart: unless-stopped
    volumes:
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - ./promtail-config.yml:/etc/promtail/config.yml:ro
    depends_on:
      - loki

  grafana:
    image: grafana/grafana:10.3.1
    restart: unless-stopped
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
      GF_SERVER_ROOT_URL: https://monitoring.example.com
    deploy:
      resources:
        limits:
          memory: 256M

# Total monitoring RAM: ~500MB

✅ Essential Alerts to Configure

✓Disk usage > 80% — logs, database WAL files, and Docker images fill disks fast. Alert early, clean proactively.
✓Container restart count — if a container restarts more than 3 times in 10 minutes, something is fundamentally broken.
✓HTTP 5xx error rate — more than 1% of requests returning 500s means your app has a production bug.
✓SSL certificate expiry — set an alert 14 days before expiration. Let's Encrypt auto-renew can silently fail.
✓Backup age > 25 hours — if your daily backup hasn't run in over a day, something is wrong with the cron job.

Chapter 8: When to Graduate to Kubernetes

Docker Compose has limits. Being honest about them is as important as knowing its strengths. Here are the signals that it's time to migrate:

🤔 Time to Consider Kubernetes When...

▸You need more than 3 servers — Compose doesn't do multi-node scheduling. If your workload genuinely needs horizontal distribution, you need an orchestrator.
▸Auto-scaling is a real requirement — not “we might need it someday” but “we get 10x traffic spikes every Black Friday.” Compose can't auto-scale.
▸2 seconds downtime is unacceptable — if you need truly zero-downtime rolling updates with canary deployments, Kubernetes does this natively.
▸Multi-region failover — if a data center going down must not affect your users, you need cluster-level orchestration across regions.
▸You have a platform team — Kubernetes pays dividends when there's a team maintaining it. Without one, it's a maintenance burden, not a force multiplier.

The key word is “when,” not “if.” Not every system will graduate to Kubernetes — and that's fine. Many production systems run on Compose for years without hitting these limits. Don't migrate out of resume-driven development.

“Not every problem needs a platform. Sometimes the most senior engineering decision you can make is choosing the simplest tool that does the job reliably. Docker Compose is that tool for a surprising number of production workloads.”