Deployment Guide
Production deployment guide for HemoStat across different platforms.
Production Checklist
Before deploying to production, ensure:
All credentials in environment variables (
.envfile)Redis password configured
TLS enabled for all network traffic
Logs shipped to centralized logging (ELK, DataDog, etc.)
Monitoring and alerting configured
Backup strategy for Redis data
Disaster recovery plan
Security audit completed
Performance testing done
Load testing done
Kubernetes Deployment
Helm Chart Structure
hemostat-chart/
├── Chart.yaml
├── values.yaml
├── templates/
│ ├── monitor-deployment.yaml
│ ├── analyzer-deployment.yaml
│ ├── responder-deployment.yaml
│ ├── alert-deployment.yaml
│ ├── dashboard-deployment.yaml
│ ├── redis-statefulset.yaml
│ └── configmap.yaml
Deploy to Kubernetes
helm install hemostat ./hemostat-chart
helm upgrade hemostat ./hemostat-chart
helm uninstall hemostat
Docker Registry
Push Images
docker tag hemostat-agents-hemostat-monitor:latest myregistry/hemostat-monitor:v1.0
docker push myregistry/hemostat-monitor:v1.0
Use Private Registry
# In docker-compose.yml
hemostat-monitor:
image: myregistry/hemostat-monitor:v1.0
imagePullPolicy: IfNotPresent
Cloud Deployments
AWS ECS
Create ECS cluster
Push images to ECR
Create ECS task definitions
Create ECS services
Configure auto-scaling
Azure Container Instances
Push images to ACR
Create container groups
Configure network
Set up monitoring
Google Cloud Run
Push images to GCR
Deploy services
Configure ingress
Set up monitoring
High Availability Setup
Redis Cluster
# Multi-node Redis cluster
redis-cli --cluster create node1:6379 node2:6379 node3:6379 ...
Agent Replicas
hemostat-analyzer-1:
build: ./agents/hemostat_analyzer
hemostat-analyzer-2:
build: ./agents/hemostat_analyzer
hemostat-analyzer-3:
build: ./agents/hemostat_analyzer
Load Balancing
Use Redis for load distribution
Agents pull work from queues
Automatic failover on agent crash
Monitoring and Alerting
Prometheus Metrics
Agent cycle times
Redis latency
Docker API latency
Remediation success rate
False alarm rate
Grafana Dashboards
System overview (all agents)
Per-agent metrics
Redis performance
Historical trends
Alerting Rules
- alert: MonitorHighLatency
expr: avg(monitor_cycle_time) > 60
for: 5m
- alert: AnalyzerHighErrorRate
expr: rate(analyzer_errors[5m]) > 0.1
for: 5m
Scaling Strategies
Vertical Scaling
Increase CPU/memory per agent
Increase container limits
Better for single-machine deployments
Horizontal Scaling
Run multiple agent instances
Redis handles message distribution
Better for production
Auto-Scaling Rules
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 70
Backup and Recovery
Redis Backup
# Enable RDB snapshots
redis-cli BGSAVE
docker cp hemostat-redis:/data/dump.rdb ./backup/
# Enable AOF (append-only file)
redis-cli CONFIG SET appendonly yes
Restore Redis
docker cp ./backup/dump.rdb hemostat-redis:/data/
docker-compose restart hemostat-redis
Security Hardening
Network
Use private networks only
Whitelist allowed IPs
Disable unnecessary ports
Secrets Management
# Use Docker secrets
docker secret create openai_key <(echo "sk-...")
docker secret create slack_webhook <(echo "https://...")
RBAC
# Kubernetes RBAC
kubectl create role hemostat-reader --verb=get --verb=list --resource=pods
kubectl create rolebinding hemostat-reader-binding --role=hemostat-reader
Audit Logging
All container operations logged
All remediation actions logged
Centralized log aggregation
Regular audit reviews
Performance Tuning
Redis Optimization
# In redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru
Agent Optimization
# Tune polling intervals
Monitor: time.sleep(10) # Faster detection
Analyzer: parallel_analysis # Process multiple alerts
Responder: batch_execution # Execute multiple fixes
Docker Optimization
# In docker-compose.yml
hemostat-monitor:
resources:
limits:
cpus: '0.5'
memory: 512M
reservations:
cpus: '0.25'
memory: 256M
Disaster Recovery
Failure Scenarios
Redis Down
Agents queue messages in memory
Messages lost if multiple agents crash
Recovery: Restart Redis, restart agents
Monitor Down
No new metrics collected
Stale data in Redis
Recovery: Restart Monitor
Analyzer Down
Alerts accumulate in Redis
Recovery: Restart Analyzer, process queue
Responder Down
Remediation requests queue up
Issues not fixed
Recovery: Restart Responder, execute queue
Alert Down
Notifications not sent
Events still recorded in Redis
Recovery: Restart Alert, send backlog
RTO/RPO Goals
RTO (Recovery Time Objective): < 5 minutes
RPO (Recovery Point Objective): < 1 minute
For detailed deployment steps, consult your cloud provider’s documentation.