Deployment Guide

Production deployment guide for HemoStat across different platforms.

Production Checklist

Before deploying to production, ensure:

All credentials in environment variables (.env file)
Redis password configured
TLS enabled for all network traffic
Logs shipped to centralized logging (ELK, DataDog, etc.)
Monitoring and alerting configured
Backup strategy for Redis data
Disaster recovery plan
Security audit completed
Performance testing done
Load testing done

Kubernetes Deployment

Helm Chart Structure

hemostat-chart/
├── Chart.yaml
├── values.yaml
├── templates/
│   ├── monitor-deployment.yaml
│   ├── analyzer-deployment.yaml
│   ├── responder-deployment.yaml
│   ├── alert-deployment.yaml
│   ├── dashboard-deployment.yaml
│   ├── redis-statefulset.yaml
│   └── configmap.yaml

Deploy to Kubernetes

helm install hemostat ./hemostat-chart
helm upgrade hemostat ./hemostat-chart
helm uninstall hemostat

Docker Registry

Push Images

docker tag hemostat-agents-hemostat-monitor:latest myregistry/hemostat-monitor:v1.0
docker push myregistry/hemostat-monitor:v1.0

Use Private Registry

# In docker-compose.yml
hemostat-monitor:
  image: myregistry/hemostat-monitor:v1.0
  imagePullPolicy: IfNotPresent

Cloud Deployments

AWS ECS

Create ECS cluster
Push images to ECR
Create ECS task definitions
Create ECS services
Configure auto-scaling

Azure Container Instances

Push images to ACR
Create container groups
Configure network
Set up monitoring

Google Cloud Run

Push images to GCR
Deploy services
Configure ingress
Set up monitoring

High Availability Setup

Redis Cluster

# Multi-node Redis cluster
redis-cli --cluster create node1:6379 node2:6379 node3:6379 ...

Agent Replicas

hemostat-analyzer-1:
  build: ./agents/hemostat_analyzer

hemostat-analyzer-2:
  build: ./agents/hemostat_analyzer

hemostat-analyzer-3:
  build: ./agents/hemostat_analyzer

Load Balancing

Use Redis for load distribution
Agents pull work from queues
Automatic failover on agent crash

Monitoring and Alerting

Prometheus Metrics

Agent cycle times
Redis latency
Docker API latency
Remediation success rate
False alarm rate

Grafana Dashboards

System overview (all agents)
Per-agent metrics
Redis performance
Historical trends

Alerting Rules

- alert: MonitorHighLatency
  expr: avg(monitor_cycle_time) > 60
  for: 5m

- alert: AnalyzerHighErrorRate
  expr: rate(analyzer_errors[5m]) > 0.1
  for: 5m

Scaling Strategies

Vertical Scaling

Increase CPU/memory per agent
Increase container limits
Better for single-machine deployments

Horizontal Scaling

Run multiple agent instances
Redis handles message distribution
Better for production

Auto-Scaling Rules

minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 70

Backup and Recovery

Redis Backup

# Enable RDB snapshots
redis-cli BGSAVE
docker cp hemostat-redis:/data/dump.rdb ./backup/

# Enable AOF (append-only file)
redis-cli CONFIG SET appendonly yes

Restore Redis

docker cp ./backup/dump.rdb hemostat-redis:/data/
docker-compose restart hemostat-redis

Security Hardening

Network

Use private networks only
Whitelist allowed IPs
Disable unnecessary ports

Secrets Management

# Use Docker secrets
docker secret create openai_key <(echo "sk-...")
docker secret create slack_webhook <(echo "https://...")

RBAC

# Kubernetes RBAC
kubectl create role hemostat-reader --verb=get --verb=list --resource=pods
kubectl create rolebinding hemostat-reader-binding --role=hemostat-reader

Audit Logging

All container operations logged
All remediation actions logged
Centralized log aggregation
Regular audit reviews

Performance Tuning

Redis Optimization

# In redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru

Agent Optimization

# Tune polling intervals
Monitor: time.sleep(10)      # Faster detection
Analyzer: parallel_analysis  # Process multiple alerts
Responder: batch_execution   # Execute multiple fixes

Docker Optimization

# In docker-compose.yml
hemostat-monitor:
  resources:
    limits:
      cpus: '0.5'
      memory: 512M
    reservations:
      cpus: '0.25'
      memory: 256M

Disaster Recovery

Failure Scenarios

Redis Down

Agents queue messages in memory
Messages lost if multiple agents crash
Recovery: Restart Redis, restart agents

Monitor Down

No new metrics collected
Stale data in Redis
Recovery: Restart Monitor

Analyzer Down

Alerts accumulate in Redis
Recovery: Restart Analyzer, process queue

Responder Down

Remediation requests queue up
Issues not fixed
Recovery: Restart Responder, execute queue

Alert Down

Notifications not sent
Events still recorded in Redis
Recovery: Restart Alert, send backlog

RTO/RPO Goals

RTO (Recovery Time Objective): < 5 minutes
RPO (Recovery Point Objective): < 1 minute

For detailed deployment steps, consult your cloud provider’s documentation.