Skip to main content

ECS Production Setup Guide

Complete guide to set up your FishingLog API production environment on AWS ECS.

⚠️ Prerequisites: Complete staging setup first. This guide focuses on production-specific differences.

Endpoint: api.reelog.app


📋 Overview

Production setup follows the same steps as staging, but with:

  • Larger instance sizes (better performance)
  • Multi-AZ deployment (high availability)
  • Enhanced monitoring (Container Insights)
  • Stricter security (no public database access)
  • Longer retention (backups, logs)
  • Deletion protection (prevent accidental deletion)

🎯 Quick Reference

Follow staging guide (ECS_STAGING_SETUP_GUIDE.md) for step-by-step instructions, but use these production-specific configurations:


🚀 Step 1: Create ECR Repository

Same as staging, but:

  • Name: reelog-api-prod
  • Tags: Environment=Production

Repository URI: 606532921651.dkr.ecr.us-east-2.amazonaws.com/reelog-api-prod


🗄️ Step 2: Create RDS PostgreSQL Database

Key differences from staging:

Configuration:

  • Creation method: Full configuration (not Easy create)
  • Engine: PostgreSQL (not Aurora)
  • Template: Production (not Free tier)
  • DB instance identifier: reelog-db-prod
  • DB instance class: db.t3.medium (4 vCPU, 4 GB RAM)
  • Storage: 100 GB (or more)
  • Multi-AZ deployment: ✅ Enable (high availability)
  • Backup retention: 30 days (vs 7 for staging)
  • Public access: ❌ No (more secure)
  • VPC security group: Create new reelog-db-sg-prod
  • Deletion protection: ✅ Enable

Tags: Environment=Production, Project=Reelog, Purpose=Database, Name=reelog-db-prod


🔴 Step 3: Create ElastiCache Valkey Cache

Key differences:

Configuration:

  • Name: reelog-valkey-prod
  • Node type: cache.t3.small or cache.t3.medium (more RAM)
  • Number of nodes: 1 with 1-2 replicas (high availability)
  • Subnet group: reelog-valkey-subnet-prod
  • Security group: reelog-valkey-sg-prod
  • Encryption in-transit: ✅ Enabled
  • Encryption at-rest: ✅ Enabled
  • Automatic backups: ✅ On (7 days retention)

Tags: Environment=Production, Project=Reelog, Purpose=Cache, Name=reelog-valkey-prod


🔒 Step 4: Create Security Groups

Same structure as staging, but:

  • Load Balancer: reelog-lb-sg-prod
  • API: reelog-api-sg-prod
  • Database: reelog-db-sg-prod
  • Valkey: reelog-valkey-sg-prod

All tags: Environment=Production


⚖️ Step 5: Create Application Load Balancer

Same as staging, but:

  • Target group: reelog-api-tg-prod
  • Load Balancer: reelog-lb-prod
  • ACM certificate: api.reelog.app
  • DNS name: reelog-lb-prod-xxxxx.us-east-2.elb.amazonaws.com

Tags: Environment=Production, Project=Reelog, Purpose=LoadBalancer, Name=reelog-lb-prod


🏗️ Step 6: Create ECS Cluster

Key differences:

Configuration:

  • Name: reelog-prod
  • Container Insights: ✅ Container Insights or Enhanced observability
    • Provides better visibility for production troubleshooting
    • Costs extra (~$0.10 per container/month) but worth it
  • Encryption: Consider custom KMS keys for compliance

Tags: Environment=Production, Project=Reelog, Purpose=ECSCluster, Name=reelog-prod


🔐 Step 7: Create IAM Roles

Option 1: Reuse Staging Roles (Simpler)

  • Can reuse reelog-ecs-task-execution-role and reelog-ecs-task-role from staging
  • Works fine if you want to share roles across environments

Option 2: Create Production-Specific Roles (More Secure - Recommended)

7.1 Task Execution Role

  1. IAM ConsoleRolesCreate role
  2. Trusted entity type: Select AWS service
  3. Use case: Select Elastic Container Service
    • You'll see: "Allow an AWS service like EC2, Lambda, or others to perform actions in this account"
    • Scroll down to find "Elastic Container Service"
  4. Select: Elastic Container Service Task
    • This is the specific use case for ECS tasks
  5. Click "Next"
  6. Add permissions:
    • Search and select: AmazonECSTaskExecutionRolePolicy
    • Search and select: AmazonEC2ContainerRegistryReadOnly
  7. Click "Next"
  8. Role name: reelog-ecs-task-execution-role-prod
  9. Description: Execution role for ECS tasks - Production - allows ECR access and CloudWatch logging
  10. Tags:
    • Tag 1: Key=Environment, Value=Production
    • Tag 2: Key=Project, Value=Reelog
    • Tag 3: Key=Purpose, Value=TaskExecutionRole
    • Tag 4: Key=Name, Value=reelog-ecs-task-execution-role-prod
  11. Click "Create role"
  12. After creation, click on the role → Add permissionsCreate inline policy:
    • JSON tab → Paste:
    {
    "Version": "2012-10-17",
    "Statement": [{
    "Effect": "Allow",
    "Action": [
    "logs:CreateLogGroup",
    "logs:CreateLogStream",
    "logs:PutLogEvents"
    ],
    "Resource": "arn:aws:logs:us-east-2:*:*"
    }]
    }
    • Policy name: CloudWatchLogsPolicy
    • Create policy

7.2 Task Role

  1. IAM ConsoleRolesCreate role
  2. Trusted entity type: Select AWS service
  3. Use case: Select Elastic Container Service
  4. Select: Elastic Container Service Task
  5. Click "Next"
  6. Add permissions:
    • Search and select: AmazonS3FullAccess
    • Or create custom policy for specific bucket access (more secure - recommended for production)
  7. Click "Next"
  8. Role name: reelog-ecs-task-role-prod
  9. Description: Task role for ECS containers - Production - allows S3 access
  10. Tags:
    • Tag 1: Key=Environment, Value=Production
    • Tag 2: Key=Project, Value=Reelog
    • Tag 3: Key=Purpose, Value=TaskRole
    • Tag 4: Key=Name, Value=reelog-ecs-task-role-prod
  11. Click "Create role"

Recommendation: For production, create production-specific roles with more restrictive policies (especially for S3 access - limit to specific buckets).


📦 Step 8: Create ECS Task Definition

Key differences:

Configuration:

  • Task definition family: reelog-api-prod
    • Family name groups different revisions of the same task definition
    • Must be 1-255 characters, valid: a-z, A-Z, 0-9, hyphens (-), underscores (_)
  • Task size: 1 vCPU, 2 GB memory (vs 0.5 vCPU, 1 GB for staging)
  • Image URI: 606532921651.dkr.ecr.us-east-2.amazonaws.com/reelog-api-prod:latest
  • Environment variables:
    • ASPNETCORE_ENVIRONMENT = Production
    • ConnectionStrings__DefaultConnection = Production database endpoint
    • Redis__ConnectionString = Production Valkey endpoint
  • Logging: awslogs
    • Log group: /ecs/reelog-api-prod
    • Region: us-east-2
    • Stream prefix: ecs
    • awslogs-create-group: ✅ Keep enabled (recommended)
      • Automatically creates CloudWatch log group if it doesn't exist
      • Safety net for production - prevents deployment failures
  • Tags:
    • Tag 1: Key=Environment, Value=Production
    • Tag 2: Key=Project, Value=Reelog
    • Tag 3: Key=Purpose, Value=TaskDefinition
    • Tag 4: Key=Name, Value=reelog-api-prod

🚀 Step 9: Create ECS Service

Key differences:

Configuration:

  • Task definition: reelog-api-prod:1
  • Service name: reelog-api-prod

Compute configuration - advanced:

Compute options:

  • Select: Launch type ✅ RECOMMENDED
    • Simpler - Direct launch without capacity provider strategy
    • Standard approach - Works well for production
    • Easier to manage - No need to configure capacity providers

Launch type:

  • Select: Fargate
    • Serverless containers - no EC2 instances to manage
    • Auto-scaling handled by AWS

Alternative: Capacity provider strategy (Advanced - Not needed):

  • ⚠️ Use custom - Only if you need advanced task distribution across multiple capacity providers
  • ⚠️ More complex - Requires configuring capacity providers
  • ⚠️ Use cluster default - Only if cluster has default capacity provider strategy configured
  • For production, use Launch type - Simpler and sufficient for most use cases

Troubleshooting configuration - recommended:

Turn on ECS Exec:

  • Enable ECS Exec: ✅ Enable (recommended for production troubleshooting)
    • Useful for production debugging - Run interactive commands in containers
    • Critical for troubleshooting - Inspect container state, check logs, test connections
    • Security: Requires proper IAM permissions (already configured with task role)
    • ⚠️ Best practice: Enable but restrict access via IAM policies if needed

What is ECS Exec?

  • Allows you to run interactive commands in running containers
  • Similar to docker exec but for ECS containers
  • Useful for debugging production issues, checking environment variables, testing database connections
  • Requires task role with appropriate permissions (already configured)

When to use:

  • Production: Enable - Critical for troubleshooting production issues
  • Use cases: Check logs, test database connections, inspect environment variables, debug production issues
  • Security: Access controlled via IAM - ensure only authorized users can use ECS Exec

Note: ECS Exec can only be enabled when launching new tasks, not for existing ones. If you need it later, you'll need to update the service.

Load balancer configuration:

  • Load balancer: reelog-lb-prod
  • Listener: Select HTTPS:443 (or HTTP:80 if you only have HTTP listener)
    • Select the HTTPS listener (port 443) since we configured HTTPS
  • Target group: reelog-api-tg-prod
  • Container: reelog-api:80:443
    • Container name: reelog-api, Container port: 80, Listener port: 443

Service auto scaling - optional:

  • Enable (highly recommended for production)

Auto Scaling Configuration:

Use service auto scaling:

  • Check "Use service auto scaling" to enable automatic scaling

Task limits:

  • Minimum number of tasks: 2 (or 4 for higher availability)
    • Lower boundary - ensures high availability
    • Production: Consider 4 for better redundancy
  • Maximum number of tasks: 50 (vs 10 for staging)
    • Upper boundary - allows scaling for high traffic
    • Adjust based on expected traffic and budget

Scaling policy type:

  • Select: Target tracking ✅ RECOMMENDED
    • Simpler - Automatically adjusts based on target metric value
    • Recommended - AWS manages scaling decisions
    • Works well for production workloads

Target tracking policy configuration:

Policy name:

  • Name: cpu-memory-scaling-prod (or leave default)

ECS service metric:

  • Select: ECSServiceAverageCPUUtilization (for CPU-based scaling)
    • Recommended - Scales based on CPU usage
    • Or select ECSServiceAverageMemoryUtilization for memory-based scaling

Target value:

  • CPU target: 70 (percent)
    • Service will scale to maintain ~70% CPU utilization
    • Balances performance and cost

Scale-out cooldown period:

  • Value: 60 seconds (default)
    • Wait 60 seconds after scaling out before scaling again
    • Prevents rapid scaling up

Scale-in cooldown period:

  • Value: 300 seconds (5 minutes) - recommended
    • Wait 5 minutes after scaling in before scaling again
    • Longer cooldown prevents rapid scaling down
    • Important for production stability

Turn off scale-in:

  • Leave unchecked (allow scale-in)
    • Recommended - Allows service to scale down when not needed
    • Cost-effective - Important for production cost management
    • ⚠️ Check only if you want to prevent scale-in (always keep max tasks)

Optional: Add second scaling policy (Memory):

  • Click "Add another policy" (if available)
  • Policy name: memory-scaling-prod
  • Metric: ECSServiceAverageMemoryUtilization
  • Target value: 80 (percent)
  • Same cooldown periods as CPU policy

Production considerations:

  • Consider multiple scaling policies (CPU + Memory) for better coverage
  • Higher max tasks (50) allows handling traffic spikes
  • Longer scale-in cooldown (5 minutes) prevents premature scale-down
  • Monitor scaling activities and adjust targets based on actual usage

Number of tasks (initial):

  • Desired count: 2-4 (vs 2 for staging)
    • Start with 2-4 tasks for production
    • Auto-scaling will adjust based on traffic

Tags - optional:

Turn on Amazon ECS managed tags:

  • Enable (highly recommended for production)
    • Automatically tags tasks with cluster and service names
    • Critical for cost tracking - See costs per service/cluster in Cost and Usage Report
    • Helps identify tasks in production environments
    • No downside - Free and essential for production cost management

Propagate tags from:

  • Select: Task definition ✅ RECOMMENDED
    • Automatically propagates tags from task definition to tasks
    • Consistent tagging - Tasks inherit task definition tags
    • Less manual work - Don't need to tag tasks separately
    • Important for production - Ensures all tasks are properly tagged
    • Alternative: Service - Propagates service tags to tasks
    • Alternative: Do not propagate - No automatic tag propagation (not recommended)

Service tags:

  • Click "Add tag" and add each of these:
  1. Tag 1:

    • Key: Environment
    • Value: Production
  2. Tag 2:

    • Key: Project
    • Value: Reelog
  3. Tag 3:

    • Key: Purpose
    • Value: ECSService
  4. Tag 4:

    • Key: Name
    • Value: reelog-api-prod
  5. Tag 5 (Optional):

    • Key: ManagedBy
    • Value: Manual

What happens:

  • Service tags: Applied to the service itself
  • ECS managed tags: Automatically added to tasks (cluster name, service name)
  • Propagated tags: Task definition tags automatically copied to tasks
  • Result: Tasks have comprehensive tags for cost tracking, compliance, and organization

🪣 Step 10: Create S3 Bucket

Single bucket with prefixes (recommended for social media apps):

  • One bucket for all content types (users, posts, groups, logs, models)
  • Organize with prefixes (folders) in your application code
  • Simpler management - Single bucket policy, one set of permissions
  • Cost-effective - No per-bucket charges

Bucket Setup:

  1. S3 ConsoleCreate bucket
  2. Bucket type: Select General purpose ✅ RECOMMENDED
    • Multi-AZ redundancy - Data stored across multiple Availability Zones
    • Cost-effective - Standard S3 pricing
    • Works with Cloudflare CDN - CDN handles latency
    • Supports all storage classes - Standard, IA, Glacier, etc.
    • Standard choice for production applications
    • ⚠️ Directory (Express One Zone) - Only for ultra-low latency needs (not needed here)
  3. Name: reelog-content-prod (globally unique)
    • Generic name allows all content types (images, videos, models, etc.)
  4. Region: us-east-2
  5. Block public access: ✅ Keep all settings enabled (we'll configure this after creation)
    • During creation, leave Block Public Access settings ON
    • We'll configure public read access AFTER bucket creation
  6. Versioning: ✅ Enable (important for production)
    • Protects against accidental deletions
  7. Encryption: SSE-S3
  8. Lifecycle policies: Consider adding (delete old versions after X days)
    • Helps manage costs by automatically archiving/deleting old versions

Tags - optional:

  • Click "Add tag" and add each of these:
  1. Tag 1:

    • Key: Environment
    • Value: Production
  2. Tag 2:

    • Key: Project
    • Value: Reelog
  3. Tag 3:

    • Key: Purpose
    • Value: Storage
  4. Tag 4:

    • Key: Name
    • Value: reelog-content-prod
  5. Tag 5 (Optional):

    • Key: ManagedBy
    • Value: Manual
  6. Create bucket

Configure Block Public Access (required before adding bucket policy):

  1. After bucket creation, go to your bucket → Permissions tab
  2. Block Public Access settingsEdit
  3. Uncheck the following setting:
    • Uncheck: "Block public access to buckets and objects granted through new public bucket or access point policies"
      • This allows the bucket policy to grant public read access
      • Other settings can remain checked for security
  4. Save changes
  5. Confirm by typing confirm in the confirmation dialog

Why this is needed:

  • The bucket policy grants public read access ("Principal": "*")
  • Block Public Access settings prevent public access by default
  • We need to allow public bucket policies to enable public read
  • This is safe because the policy only allows GetObject (read), not write/delete

Add Bucket Policy (for public read):

  1. Still in Permissions tabBucket policyEdit
  2. Paste the following JSON policy:
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::reelog-content-prod/*"
}]
}
  1. Save changes
  2. Verify - You should see a warning that the bucket is publicly accessible (this is expected)

What this policy does:

  • Allows public read access (s3:GetObject) to all objects in the bucket
  • Anyone can view/download files via URL
  • Does NOT allow write, delete, or list operations
  • Safe for serving images/media via Cloudflare CDN

Prefix Structure (organized in application code):

reelog-content-prod/
├── users/{userId}/ # User avatars, cover photos, profile photos
├── posts/{postId}/ # Post images/videos
├── groups/{groupId}/ # Group avatars, cover photos
├── logs/{logId}/ # Fishing log photos, catch photos
├── models/ # 3D models (boats, rods, reels)
│ ├── boats/
│ ├── rods/
│ └── reels/
└── temp/ # Temporary uploads (presigned URLs)

Note: Prefix organization happens in your application code when uploading files. The bucket itself is just a container - AWS doesn't enforce folder structure.


📊 Step 11: Create/Update CloudWatch Log Group

Check if log group already exists:

  1. CloudWatch ConsoleLogsLog groups
  2. Search for /ecs/reelog-api-prod

If log group exists (update retention):

  1. Click on /ecs/reelog-api-prod
  2. Configuration tab → Edit retention
  3. Retention period: Select 30 days (recommended for production)
    • vs 7 days for staging
    • Longer retention helps with production troubleshooting
  4. Save changes

If log group doesn't exist (create new):

  1. Log groupsCreate log group
  2. Log group name: /ecs/reelog-api-prod
  3. Retention: 30 days
  4. Create

Key differences from staging:

  • Log group name: /ecs/reelog-api-prod (vs /ecs/reelog-api-staging)
  • Retention: 30 days (vs 7 for staging)
    • Longer retention for production troubleshooting and compliance

Note: AWS may automatically create log groups when ECS tasks start logging. If it already exists, just update the retention settings.


🌐 Step 12: Configure Cloudflare DNS

12.1 ACM Certificate Validation

  1. ACM Console → Certificate → Copy DNS validation record
  2. CloudflareDNSAdd record
    • Type: CNAME
    • Name: Validation record name from ACM
    • Target: Validation record value from ACM
    • Proxy: ❌ DNS only (gray cloud) - Required!
    • TTL: Auto
    • Comment (optional): ACM Certificate Validation - api.reelog.app
      • Helps identify this record for future reference
  3. Wait 5-10 minutes → Certificate status: "Issued"

12.2 API DNS Record

  1. CloudflareDNSAdd record
    • Type: CNAME
    • Name: api (not api-staging)
    • Target: Production Load Balancer DNS name
    • Proxy: ✅ Proxied (orange cloud)
    • TTL: Auto
    • Comment (optional): Production API - Points to AWS ALB
      • Documents that this routes api.reelog.app to the production load balancer
  2. Save

12.3 SSL/TLS Settings

  1. CloudflareSSL/TLSOverview
  2. SSL/TLS encryption mode: Full (strict)
    • Requires valid ACM certificate on Load Balancer
    • Provides end-to-end encryption

🐳 Step 13: Build and Push Docker Image

Same process, but push to production repository:

docker tag reelog-api-staging:latest 606532921651.dkr.ecr.us-east-2.amazonaws.com/reelog-api-prod:latest
docker push 606532921651.dkr.ecr.us-east-2.amazonaws.com/reelog-api-prod:latest

✅ Step 14: Verify Deployment

curl https://api.reelog.app/ping

📋 Production Checklist

  • All staging resources created first
  • Production ECR repository created
  • Production RDS database created (Multi-AZ, larger instance)
  • Production Valkey cache created (with replicas)
  • Production security groups created
  • Production Load Balancer created
  • Production Target group created
  • Production ACM certificate validated
  • Production ECS cluster created (with Container Insights)
  • Production Task definition created (larger resources)
  • Production ECS service running (2-4 tasks)
  • Production S3 bucket created
  • Production CloudWatch log group created (30-day retention)
  • Cloudflare DNS configured for production
  • Production Docker image pushed
  • API accessible at https://api.reelog.app

💰 Estimated Monthly Costs (Production)

  • ECS Fargate (2-4 tasks): ~$30-60/month
  • RDS db.t3.medium (Multi-AZ): ~$100/month
  • Valkey cache.t3.small (with replica): ~$24/month
  • Load Balancer: ~$16/month
  • Cloudflare: FREE
  • S3: ~$5-20/month (depends on storage)
  • CloudWatch Container Insights: ~$2-4/month
  • Total: ~$177-224/month

🔒 Production Security Checklist

  • Database has no public access
  • All security groups properly configured
  • Deletion protection enabled on critical resources
  • Multi-AZ deployment for database and cache
  • Encryption enabled (in-transit and at-rest)
  • Backups configured (30-day retention)
  • Monitoring enabled (Container Insights)
  • Logs retained (30 days minimum)
  • ACM certificates validated and configured
  • Cloudflare SSL/TLS set to Full (strict)

🚀 Next Steps

  1. Set up CI/CD (GitHub Actions, AWS CodePipeline)
  2. Configure monitoring alerts (CloudWatch alarms)
  3. Set up automated backups (RDS automated backups)
  4. Configure auto-scaling (tune based on traffic)
  5. Set up production database (migrate from staging/local)
  6. Configure production secrets (Secrets Manager)
  7. Set up production monitoring (CloudWatch dashboards)

Last Updated: Current Date
Region: us-east-2
Domain: reelog.app
Reference: See ECS_STAGING_SETUP_GUIDE.md for detailed step-by-step instructions