# Deployment Guide Production deployment guide for HemoStat across different platforms. ## Production Checklist Before deploying to production, ensure: - All credentials in environment variables (`.env` file) - Redis password configured - TLS enabled for all network traffic - Logs shipped to centralized logging (ELK, DataDog, etc.) - Monitoring and alerting configured - Backup strategy for Redis data - Disaster recovery plan - Security audit completed - Performance testing done - Load testing done ## Kubernetes Deployment ### Helm Chart Structure ```text hemostat-chart/ ├── Chart.yaml ├── values.yaml ├── templates/ │ ├── monitor-deployment.yaml │ ├── analyzer-deployment.yaml │ ├── responder-deployment.yaml │ ├── alert-deployment.yaml │ ├── dashboard-deployment.yaml │ ├── redis-statefulset.yaml │ └── configmap.yaml ``` ### Deploy to Kubernetes ```bash helm install hemostat ./hemostat-chart helm upgrade hemostat ./hemostat-chart helm uninstall hemostat ``` ## Docker Registry ### Push Images ```bash docker tag hemostat-agents-hemostat-monitor:latest myregistry/hemostat-monitor:v1.0 docker push myregistry/hemostat-monitor:v1.0 ``` ### Use Private Registry ```yaml # In docker-compose.yml hemostat-monitor: image: myregistry/hemostat-monitor:v1.0 imagePullPolicy: IfNotPresent ``` ## Cloud Deployments ### AWS ECS 1. Create ECS cluster 2. Push images to ECR 3. Create ECS task definitions 4. Create ECS services 5. Configure auto-scaling ### Azure Container Instances 1. Push images to ACR 2. Create container groups 3. Configure network 4. Set up monitoring ### Google Cloud Run 1. Push images to GCR 2. Deploy services 3. Configure ingress 4. Set up monitoring ## High Availability Setup ### Redis Cluster ```bash # Multi-node Redis cluster redis-cli --cluster create node1:6379 node2:6379 node3:6379 ... ``` ### Agent Replicas ```yaml hemostat-analyzer-1: build: ./agents/hemostat_analyzer hemostat-analyzer-2: build: ./agents/hemostat_analyzer hemostat-analyzer-3: build: ./agents/hemostat_analyzer ``` ### Load Balancing - Use Redis for load distribution - Agents pull work from queues - Automatic failover on agent crash ## Monitoring and Alerting ### Prometheus Metrics - Agent cycle times - Redis latency - Docker API latency - Remediation success rate - False alarm rate ### Grafana Dashboards 1. System overview (all agents) 2. Per-agent metrics 3. Redis performance 4. Historical trends ### Alerting Rules ```yaml - alert: MonitorHighLatency expr: avg(monitor_cycle_time) > 60 for: 5m - alert: AnalyzerHighErrorRate expr: rate(analyzer_errors[5m]) > 0.1 for: 5m ``` ## Scaling Strategies ### Vertical Scaling - Increase CPU/memory per agent - Increase container limits - Better for single-machine deployments ### Horizontal Scaling - Run multiple agent instances - Redis handles message distribution - Better for production ### Auto-Scaling Rules ```yaml minReplicas: 1 maxReplicas: 10 targetCPUUtilizationPercentage: 70 ``` ## Backup and Recovery ### Redis Backup ```bash # Enable RDB snapshots redis-cli BGSAVE docker cp hemostat-redis:/data/dump.rdb ./backup/ # Enable AOF (append-only file) redis-cli CONFIG SET appendonly yes ``` ### Restore Redis ```bash docker cp ./backup/dump.rdb hemostat-redis:/data/ docker-compose restart hemostat-redis ``` ## Security Hardening ### Network - Use private networks only - Whitelist allowed IPs - Disable unnecessary ports ### Secrets Management ```bash # Use Docker secrets docker secret create openai_key <(echo "sk-...") docker secret create slack_webhook <(echo "https://...") ``` ### RBAC ```bash # Kubernetes RBAC kubectl create role hemostat-reader --verb=get --verb=list --resource=pods kubectl create rolebinding hemostat-reader-binding --role=hemostat-reader ``` ### Audit Logging - All container operations logged - All remediation actions logged - Centralized log aggregation - Regular audit reviews ## Performance Tuning ### Redis Optimization ```bash # In redis.conf maxmemory 2gb maxmemory-policy allkeys-lru ``` ### Agent Optimization ```python # Tune polling intervals Monitor: time.sleep(10) # Faster detection Analyzer: parallel_analysis # Process multiple alerts Responder: batch_execution # Execute multiple fixes ``` ### Docker Optimization ```yaml # In docker-compose.yml hemostat-monitor: resources: limits: cpus: '0.5' memory: 512M reservations: cpus: '0.25' memory: 256M ``` ## Disaster Recovery ### Failure Scenarios **Redis Down** - Agents queue messages in memory - Messages lost if multiple agents crash - Recovery: Restart Redis, restart agents **Monitor Down** - No new metrics collected - Stale data in Redis - Recovery: Restart Monitor **Analyzer Down** - Alerts accumulate in Redis - Recovery: Restart Analyzer, process queue **Responder Down** - Remediation requests queue up - Issues not fixed - Recovery: Restart Responder, execute queue **Alert Down** - Notifications not sent - Events still recorded in Redis - Recovery: Restart Alert, send backlog ### RTO/RPO Goals - **RTO** (Recovery Time Objective): < 5 minutes - **RPO** (Recovery Point Objective): < 1 minute For detailed deployment steps, consult your cloud provider's documentation.