ECS Production Setup Guide
Complete guide to set up your FishingLog API production environment on AWS ECS.
⚠️ Prerequisites: Complete staging setup first. This guide focuses on production-specific differences.
Endpoint: api.reelog.app
📋 Overview
Production setup follows the same steps as staging, but with:
- ✅ Larger instance sizes (better performance)
- ✅ Multi-AZ deployment (high availability)
- ✅ Enhanced monitoring (Container Insights)
- ✅ Stricter security (no public database access)
- ✅ Longer retention (backups, logs)
- ✅ Deletion protection (prevent accidental deletion)
🎯 Quick Reference
Follow staging guide (ECS_STAGING_SETUP_GUIDE.md) for step-by-step instructions, but use these production-specific configurations:
🚀 Step 1: Create ECR Repository
Same as staging, but:
- Name:
reelog-api-prod - Tags: Environment=Production
Repository URI: 606532921651.dkr.ecr.us-east-2.amazonaws.com/reelog-api-prod
🗄️ Step 2: Create RDS PostgreSQL Database
Key differences from staging:
Configuration:
- Creation method: Full configuration (not Easy create)
- Engine: PostgreSQL (not Aurora)
- Template: Production (not Free tier)
- DB instance identifier:
reelog-db-prod - DB instance class:
db.t3.medium(4 vCPU, 4 GB RAM) - Storage: 100 GB (or more)
- Multi-AZ deployment: ✅ Enable (high availability)
- Backup retention: 30 days (vs 7 for staging)
- Public access: ❌ No (more secure)
- VPC security group: Create new
reelog-db-sg-prod - Deletion protection: ✅ Enable
Tags: Environment=Production, Project=Reelog, Purpose=Database, Name=reelog-db-prod
🔴 Step 3: Create ElastiCache Valkey Cache
Key differences:
Configuration:
- Name:
reelog-valkey-prod - Node type:
cache.t3.smallorcache.t3.medium(more RAM) - Number of nodes: 1 with 1-2 replicas (high availability)
- Subnet group:
reelog-valkey-subnet-prod - Security group:
reelog-valkey-sg-prod - Encryption in-transit: ✅ Enabled
- Encryption at-rest: ✅ Enabled
- Automatic backups: ✅ On (7 days retention)
Tags: Environment=Production, Project=Reelog, Purpose=Cache, Name=reelog-valkey-prod
🔒 Step 4: Create Security Groups
Same structure as staging, but:
- Load Balancer:
reelog-lb-sg-prod - API:
reelog-api-sg-prod - Database:
reelog-db-sg-prod - Valkey:
reelog-valkey-sg-prod
All tags: Environment=Production
⚖️ Step 5: Create Application Load Balancer
Same as staging, but:
- Target group:
reelog-api-tg-prod - Load Balancer:
reelog-lb-prod - ACM certificate:
api.reelog.app - DNS name:
reelog-lb-prod-xxxxx.us-east-2.elb.amazonaws.com
Tags: Environment=Production, Project=Reelog, Purpose=LoadBalancer, Name=reelog-lb-prod
🏗️ Step 6: Create ECS Cluster
Key differences:
Configuration:
- Name:
reelog-prod - Container Insights: ✅ Container Insights or Enhanced observability
- Provides better visibility for production troubleshooting
- Costs extra (~$0.10 per container/month) but worth it
- Encryption: Consider custom KMS keys for compliance
Tags: Environment=Production, Project=Reelog, Purpose=ECSCluster, Name=reelog-prod
🔐 Step 7: Create IAM Roles
Option 1: Reuse Staging Roles (Simpler)
- Can reuse
reelog-ecs-task-execution-roleandreelog-ecs-task-rolefrom staging - Works fine if you want to share roles across environments
Option 2: Create Production-Specific Roles (More Secure - Recommended)
7.1 Task Execution Role
- IAM Console → Roles → Create role
- Trusted entity type: Select AWS service
- Use case: Select Elastic Container Service
- You'll see: "Allow an AWS service like EC2, Lambda, or others to perform actions in this account"
- Scroll down to find "Elastic Container Service"
- Select: Elastic Container Service Task
- This is the specific use case for ECS tasks
- Click "Next"
- Add permissions:
- Search and select:
AmazonECSTaskExecutionRolePolicy - Search and select:
AmazonEC2ContainerRegistryReadOnly
- Search and select:
- Click "Next"
- Role name:
reelog-ecs-task-execution-role-prod - Description:
Execution role for ECS tasks - Production - allows ECR access and CloudWatch logging - Tags:
- Tag 1: Key=
Environment, Value=Production - Tag 2: Key=
Project, Value=Reelog - Tag 3: Key=
Purpose, Value=TaskExecutionRole - Tag 4: Key=
Name, Value=reelog-ecs-task-execution-role-prod
- Tag 1: Key=
- Click "Create role"
- After creation, click on the role → Add permissions → Create inline policy:
- JSON tab → Paste:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:us-east-2:*:*"
}]
}- Policy name:
CloudWatchLogsPolicy - Create policy
7.2 Task Role
- IAM Console → Roles → Create role
- Trusted entity type: Select AWS service
- Use case: Select Elastic Container Service
- Select: Elastic Container Service Task
- Click "Next"
- Add permissions:
- Search and select:
AmazonS3FullAccess - Or create custom policy for specific bucket access (more secure - recommended for production)
- Search and select:
- Click "Next"
- Role name:
reelog-ecs-task-role-prod - Description:
Task role for ECS containers - Production - allows S3 access - Tags:
- Tag 1: Key=
Environment, Value=Production - Tag 2: Key=
Project, Value=Reelog - Tag 3: Key=
Purpose, Value=TaskRole - Tag 4: Key=
Name, Value=reelog-ecs-task-role-prod
- Tag 1: Key=
- Click "Create role"
Recommendation: For production, create production-specific roles with more restrictive policies (especially for S3 access - limit to specific buckets).
📦 Step 8: Create ECS Task Definition
Key differences:
Configuration:
- Task definition family:
reelog-api-prod- Family name groups different revisions of the same task definition
- Must be 1-255 characters, valid: a-z, A-Z, 0-9, hyphens (-), underscores (_)
- Task size: 1 vCPU, 2 GB memory (vs 0.5 vCPU, 1 GB for staging)
- Image URI:
606532921651.dkr.ecr.us-east-2.amazonaws.com/reelog-api-prod:latest - Environment variables:
ASPNETCORE_ENVIRONMENT=ProductionConnectionStrings__DefaultConnection= Production database endpointRedis__ConnectionString= Production Valkey endpoint
- Logging: awslogs
- Log group:
/ecs/reelog-api-prod - Region:
us-east-2 - Stream prefix:
ecs - awslogs-create-group: ✅ Keep enabled (recommended)
- Automatically creates CloudWatch log group if it doesn't exist
- Safety net for production - prevents deployment failures
- Log group:
- Tags:
- Tag 1: Key=
Environment, Value=Production - Tag 2: Key=
Project, Value=Reelog - Tag 3: Key=
Purpose, Value=TaskDefinition - Tag 4: Key=
Name, Value=reelog-api-prod
- Tag 1: Key=
🚀 Step 9: Create ECS Service
Key differences:
Configuration:
- Task definition:
reelog-api-prod:1 - Service name:
reelog-api-prod
Compute configuration - advanced:
Compute options:
- Select: Launch type ✅ RECOMMENDED
- ✅ Simpler - Direct launch without capacity provider strategy
- ✅ Standard approach - Works well for production
- ✅ Easier to manage - No need to configure capacity providers
Launch type:
- Select: Fargate
- Serverless containers - no EC2 instances to manage
- Auto-scaling handled by AWS
Alternative: Capacity provider strategy (Advanced - Not needed):
- ⚠️ Use custom - Only if you need advanced task distribution across multiple capacity providers
- ⚠️ More complex - Requires configuring capacity providers
- ⚠️ Use cluster default - Only if cluster has default capacity provider strategy configured
- ✅ For production, use Launch type - Simpler and sufficient for most use cases
Troubleshooting configuration - recommended:
Turn on ECS Exec:
- Enable ECS Exec: ✅ Enable (recommended for production troubleshooting)
- ✅ Useful for production debugging - Run interactive commands in containers
- ✅ Critical for troubleshooting - Inspect container state, check logs, test connections
- ✅ Security: Requires proper IAM permissions (already configured with task role)
- ⚠️ Best practice: Enable but restrict access via IAM policies if needed
What is ECS Exec?
- Allows you to run interactive commands in running containers
- Similar to
docker execbut for ECS containers - Useful for debugging production issues, checking environment variables, testing database connections
- Requires task role with appropriate permissions (already configured)
When to use:
- ✅ Production: Enable - Critical for troubleshooting production issues
- Use cases: Check logs, test database connections, inspect environment variables, debug production issues
- Security: Access controlled via IAM - ensure only authorized users can use ECS Exec
Note: ECS Exec can only be enabled when launching new tasks, not for existing ones. If you need it later, you'll need to update the service.
Load balancer configuration:
- Load balancer:
reelog-lb-prod - Listener: Select HTTPS:443 (or HTTP:80 if you only have HTTP listener)
- Select the HTTPS listener (port 443) since we configured HTTPS
- Target group:
reelog-api-tg-prod - Container:
reelog-api:80:443- Container name:
reelog-api, Container port:80, Listener port:443
- Container name:
Service auto scaling - optional:
- ✅ Enable (highly recommended for production)
Auto Scaling Configuration:
Use service auto scaling:
- ✅ Check "Use service auto scaling" to enable automatic scaling
Task limits:
- Minimum number of tasks:
2(or4for higher availability)- Lower boundary - ensures high availability
- Production: Consider 4 for better redundancy
- Maximum number of tasks:
50(vs 10 for staging)- Upper boundary - allows scaling for high traffic
- Adjust based on expected traffic and budget
Scaling policy type:
- Select: Target tracking ✅ RECOMMENDED
- ✅ Simpler - Automatically adjusts based on target metric value
- ✅ Recommended - AWS manages scaling decisions
- ✅ Works well for production workloads
Target tracking policy configuration:
Policy name:
- Name:
cpu-memory-scaling-prod(or leave default)
ECS service metric:
- Select: ECSServiceAverageCPUUtilization (for CPU-based scaling)
- ✅ Recommended - Scales based on CPU usage
- Or select ECSServiceAverageMemoryUtilization for memory-based scaling
Target value:
- CPU target:
70(percent)- Service will scale to maintain ~70% CPU utilization
- Balances performance and cost
Scale-out cooldown period:
- Value:
60seconds (default)- Wait 60 seconds after scaling out before scaling again
- Prevents rapid scaling up
Scale-in cooldown period:
- Value:
300seconds (5 minutes) - recommended- Wait 5 minutes after scaling in before scaling again
- Longer cooldown prevents rapid scaling down
- Important for production stability
Turn off scale-in:
- ❌ Leave unchecked (allow scale-in)
- ✅ Recommended - Allows service to scale down when not needed
- ✅ Cost-effective - Important for production cost management
- ⚠️ Check only if you want to prevent scale-in (always keep max tasks)
Optional: Add second scaling policy (Memory):
- Click "Add another policy" (if available)
- Policy name:
memory-scaling-prod - Metric: ECSServiceAverageMemoryUtilization
- Target value:
80(percent) - Same cooldown periods as CPU policy
Production considerations:
- Consider multiple scaling policies (CPU + Memory) for better coverage
- Higher max tasks (50) allows handling traffic spikes
- Longer scale-in cooldown (5 minutes) prevents premature scale-down
- Monitor scaling activities and adjust targets based on actual usage
Number of tasks (initial):
- Desired count:
2-4(vs 2 for staging)- Start with 2-4 tasks for production
- Auto-scaling will adjust based on traffic
Tags - optional:
Turn on Amazon ECS managed tags:
- ✅ Enable (highly recommended for production)
- ✅ Automatically tags tasks with cluster and service names
- ✅ Critical for cost tracking - See costs per service/cluster in Cost and Usage Report
- ✅ Helps identify tasks in production environments
- ✅ No downside - Free and essential for production cost management
Propagate tags from:
- Select: Task definition ✅ RECOMMENDED
- ✅ Automatically propagates tags from task definition to tasks
- ✅ Consistent tagging - Tasks inherit task definition tags
- ✅ Less manual work - Don't need to tag tasks separately
- ✅ Important for production - Ensures all tasks are properly tagged
- Alternative: Service - Propagates service tags to tasks
- Alternative: Do not propagate - No automatic tag propagation (not recommended)
Service tags:
- Click "Add tag" and add each of these:
-
Tag 1:
- Key:
Environment - Value:
Production
- Key:
-
Tag 2:
- Key:
Project - Value:
Reelog
- Key:
-
Tag 3:
- Key:
Purpose - Value:
ECSService
- Key:
-
Tag 4:
- Key:
Name - Value:
reelog-api-prod
- Key:
-
Tag 5 (Optional):
- Key:
ManagedBy - Value:
Manual
- Key:
What happens:
- Service tags: Applied to the service itself
- ECS managed tags: Automatically added to tasks (cluster name, service name)
- Propagated tags: Task definition tags automatically copied to tasks
- Result: Tasks have comprehensive tags for cost tracking, compliance, and organization
🪣 Step 10: Create S3 Bucket
Single bucket with prefixes (recommended for social media apps):
- ✅ One bucket for all content types (users, posts, groups, logs, models)
- ✅ Organize with prefixes (folders) in your application code
- ✅ Simpler management - Single bucket policy, one set of permissions
- ✅ Cost-effective - No per-bucket charges
Bucket Setup:
- S3 Console → Create bucket
- Bucket type: Select General purpose ✅ RECOMMENDED
- ✅ Multi-AZ redundancy - Data stored across multiple Availability Zones
- ✅ Cost-effective - Standard S3 pricing
- ✅ Works with Cloudflare CDN - CDN handles latency
- ✅ Supports all storage classes - Standard, IA, Glacier, etc.
- ✅ Standard choice for production applications
- ⚠️ Directory (Express One Zone) - Only for ultra-low latency needs (not needed here)
- Name:
reelog-content-prod(globally unique)- Generic name allows all content types (images, videos, models, etc.)
- Region:
us-east-2 - Block public access: ✅ Keep all settings enabled (we'll configure this after creation)
- During creation, leave Block Public Access settings ON
- We'll configure public read access AFTER bucket creation
- Versioning: ✅ Enable (important for production)
- Protects against accidental deletions
- Encryption: SSE-S3
- Lifecycle policies: Consider adding (delete old versions after X days)
- Helps manage costs by automatically archiving/deleting old versions
Tags - optional:
- Click "Add tag" and add each of these:
-
Tag 1:
- Key:
Environment - Value:
Production
- Key:
-
Tag 2:
- Key:
Project - Value:
Reelog
- Key:
-
Tag 3:
- Key:
Purpose - Value:
Storage
- Key:
-
Tag 4:
- Key:
Name - Value:
reelog-content-prod
- Key:
-
Tag 5 (Optional):
- Key:
ManagedBy - Value:
Manual
- Key:
-
Create bucket
Configure Block Public Access (required before adding bucket policy):
- After bucket creation, go to your bucket → Permissions tab
- Block Public Access settings → Edit
- Uncheck the following setting:
- ✅ Uncheck: "Block public access to buckets and objects granted through new public bucket or access point policies"
- This allows the bucket policy to grant public read access
- Other settings can remain checked for security
- ✅ Uncheck: "Block public access to buckets and objects granted through new public bucket or access point policies"
- Save changes
- Confirm by typing
confirmin the confirmation dialog
Why this is needed:
- The bucket policy grants public read access (
"Principal": "*") - Block Public Access settings prevent public access by default
- We need to allow public bucket policies to enable public read
- This is safe because the policy only allows
GetObject(read), not write/delete
Add Bucket Policy (for public read):
- Still in Permissions tab → Bucket policy → Edit
- Paste the following JSON policy:
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::reelog-content-prod/*"
}]
}
- Save changes
- Verify - You should see a warning that the bucket is publicly accessible (this is expected)
What this policy does:
- Allows public read access (
s3:GetObject) to all objects in the bucket - Anyone can view/download files via URL
- Does NOT allow write, delete, or list operations
- Safe for serving images/media via Cloudflare CDN
Prefix Structure (organized in application code):
reelog-content-prod/
├── users/{userId}/ # User avatars, cover photos, profile photos
├── posts/{postId}/ # Post images/videos
├── groups/{groupId}/ # Group avatars, cover photos
├── logs/{logId}/ # Fishing log photos, catch photos
├── models/ # 3D models (boats, rods, reels)
│ ├── boats/
│ ├── rods/
│ └── reels/
└── temp/ # Temporary uploads (presigned URLs)
Note: Prefix organization happens in your application code when uploading files. The bucket itself is just a container - AWS doesn't enforce folder structure.
📊 Step 11: Create/Update CloudWatch Log Group
Check if log group already exists:
- CloudWatch Console → Logs → Log groups
- Search for
/ecs/reelog-api-prod
If log group exists (update retention):
- Click on
/ecs/reelog-api-prod - Configuration tab → Edit retention
- Retention period: Select 30 days (recommended for production)
- vs 7 days for staging
- Longer retention helps with production troubleshooting
- Save changes
If log group doesn't exist (create new):
- Log groups → Create log group
- Log group name:
/ecs/reelog-api-prod - Retention: 30 days
- Create
Key differences from staging:
- Log group name:
/ecs/reelog-api-prod(vs/ecs/reelog-api-staging) - Retention: 30 days (vs 7 for staging)
- Longer retention for production troubleshooting and compliance
Note: AWS may automatically create log groups when ECS tasks start logging. If it already exists, just update the retention settings.
🌐 Step 12: Configure Cloudflare DNS
12.1 ACM Certificate Validation
- ACM Console → Certificate → Copy DNS validation record
- Cloudflare → DNS → Add record
- Type: CNAME
- Name: Validation record name from ACM
- Target: Validation record value from ACM
- Proxy: ❌ DNS only (gray cloud) - Required!
- TTL: Auto
- Comment (optional):
ACM Certificate Validation - api.reelog.app- Helps identify this record for future reference
- Wait 5-10 minutes → Certificate status: "Issued"
12.2 API DNS Record
- Cloudflare → DNS → Add record
- Type: CNAME
- Name:
api(notapi-staging) - Target: Production Load Balancer DNS name
- Proxy: ✅ Proxied (orange cloud)
- TTL: Auto
- Comment (optional):
Production API - Points to AWS ALB- Documents that this routes api.reelog.app to the production load balancer
- Save
12.3 SSL/TLS Settings
- Cloudflare → SSL/TLS → Overview
- SSL/TLS encryption mode: Full (strict)
- Requires valid ACM certificate on Load Balancer
- Provides end-to-end encryption
🐳 Step 13: Build and Push Docker Image
Same process, but push to production repository:
docker tag reelog-api-staging:latest 606532921651.dkr.ecr.us-east-2.amazonaws.com/reelog-api-prod:latest
docker push 606532921651.dkr.ecr.us-east-2.amazonaws.com/reelog-api-prod:latest
✅ Step 14: Verify Deployment
curl https://api.reelog.app/ping
📋 Production Checklist
- All staging resources created first
- Production ECR repository created
- Production RDS database created (Multi-AZ, larger instance)
- Production Valkey cache created (with replicas)
- Production security groups created
- Production Load Balancer created
- Production Target group created
- Production ACM certificate validated
- Production ECS cluster created (with Container Insights)
- Production Task definition created (larger resources)
- Production ECS service running (2-4 tasks)
- Production S3 bucket created
- Production CloudWatch log group created (30-day retention)
- Cloudflare DNS configured for production
- Production Docker image pushed
- API accessible at
https://api.reelog.app
💰 Estimated Monthly Costs (Production)
- ECS Fargate (2-4 tasks): ~$30-60/month
- RDS db.t3.medium (Multi-AZ): ~$100/month
- Valkey cache.t3.small (with replica): ~$24/month
- Load Balancer: ~$16/month
- Cloudflare: FREE
- S3: ~$5-20/month (depends on storage)
- CloudWatch Container Insights: ~$2-4/month
- Total: ~$177-224/month
🔒 Production Security Checklist
- Database has no public access
- All security groups properly configured
- Deletion protection enabled on critical resources
- Multi-AZ deployment for database and cache
- Encryption enabled (in-transit and at-rest)
- Backups configured (30-day retention)
- Monitoring enabled (Container Insights)
- Logs retained (30 days minimum)
- ACM certificates validated and configured
- Cloudflare SSL/TLS set to Full (strict)
🚀 Next Steps
- Set up CI/CD (GitHub Actions, AWS CodePipeline)
- Configure monitoring alerts (CloudWatch alarms)
- Set up automated backups (RDS automated backups)
- Configure auto-scaling (tune based on traffic)
- Set up production database (migrate from staging/local)
- Configure production secrets (Secrets Manager)
- Set up production monitoring (CloudWatch dashboards)
Last Updated: Current Date
Region: us-east-2
Domain: reelog.app
Reference: See ECS_STAGING_SETUP_GUIDE.md for detailed step-by-step instructions