# Troubleshooting Guide

Solutions for common issues and debugging techniques.

## Common Issues

### Services Won't Start

**Problem:** `docker-compose up` fails or services crash

**Solutions:**

```bash
# Check Docker is running
docker ps

# View service logs
docker-compose logs hemostat-redis
docker-compose logs hemostat-monitor

# Rebuild from scratch
docker-compose down -v
docker-compose build --no-cache
docker-compose up -d

# Check for port conflicts
lsof -i :8501  # Streamlit
lsof -i :3000  # Arcane
lsof -i :6379  # Redis
```

### Monitor Not Detecting Issues

**Problem:** Container anomalies not appearing in logs

**Check:**

```bash
# View Monitor logs
docker-compose logs -f hemostat-monitor

# Verify Redis connection
docker exec hemostat-redis redis-cli ping

# Check test-api is running
docker-compose ps | grep test-api

# Manually trigger issue
docker exec hemostat-test-api apk add stress
docker exec hemostat-test-api stress --cpu 4 --timeout 10
```

**Solutions:**

- Increase memory/CPU stress to exceed thresholds (80% memory, 85% CPU)
- Check Monitor polling interval (default 30 seconds)
- Verify Redis is healthy: `docker-compose logs hemostat-redis`

### Analyzer Errors

**Problem:** Analyzer crashes or doesn't process alerts

**Check:**

```bash
# View Analyzer logs
docker-compose logs -f hemostat-analyzer

# Check if OpenAI API key is set
echo $OPENAI_API_KEY

# Verify Redis connection
docker exec hemostat-redis redis-cli KEYS "hemostat:*"
```

**Solutions:**

- Without API key, system uses fallback rule-based analysis (normal!)
- Check API key is valid: `curl https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY"`
- Check rate limits on OpenAI account
- Review Analyzer logs for error details

### Analyzer - Anthropic API Authentication Error (401)

**Problem:** `Error code: 401 - invalid x-api-key` when using Claude models

**Root Causes:**

1. Wrong environment file being loaded - Docker Compose uses `.env` by default, not `.env.docker.{platform}`
2. Incorrect ChatAnthropic parameter - Using `model_name` instead of `model`
3. API key not in the correct .env file - Key is in `.env.docker.windows` but not in `.env`

**Check:**

```bash
# Verify API key is in the container
docker exec hemostat-analyzer printenv | grep ANTHROPIC_API_KEY

# Check which env file Docker Compose is using
docker inspect hemostat-analyzer | grep -A 20 "Env"

# Verify the API key format (should start with sk-ant-)
docker exec hemostat-analyzer printenv ANTHROPIC_API_KEY
```

**Solutions:**

**Option 1: Use correct env file with Docker Compose (Recommended)**

Always use the platform-specific env file when building/running:

```bash
# Windows
docker compose -f docker-compose.yml -f docker-compose.windows.yml --env-file .env.docker.windows build analyzer --no-cache
docker compose -f docker-compose.yml -f docker-compose.windows.yml --env-file .env.docker.windows up -d analyzer

# Linux
docker compose -f docker-compose.yml -f docker-compose.linux.yml --env-file .env.docker.linux build analyzer --no-cache
docker compose -f docker-compose.yml -f docker-compose.linux.yml --env-file .env.docker.linux up -d analyzer

# macOS
docker compose -f docker-compose.yml -f docker-compose.macos.yml --env-file .env.docker.macos build analyzer --no-cache
docker compose -f docker-compose.yml -f docker-compose.macos.yml --env-file .env.docker.macos up -d analyzer
```

**Option 2: Add API key to default .env file**

Copy your Anthropic API key from `.env.docker.windows` to `.env`:

```bash
# Edit .env and set:
ANTHROPIC_API_KEY=sk-ant-api03-YOUR_KEY_HERE
AI_MODEL=claude-haiku-4-5-20251001
```

Then rebuild normally:

```bash
docker compose build analyzer --no-cache
docker compose up -d analyzer
```

See README.md section "Building & Rebuilding Services" for complete platform-specific commands.

### Responder Not Fixing Issues

**Problem:** Remediation not executing

**Check:**

```bash
# View Responder logs
docker-compose logs -f hemostat-responder

# Check Docker socket permissions
ls -la /var/run/docker.sock

# Verify container can be restarted
docker exec hemostat-responder docker restart hemostat-test-api
```

**Solutions:**

- Check cooldown not active: `docker exec hemostat-redis redis-cli GET "hemostat:remediation:*"`
- Verify Docker socket is mounted correctly in docker-compose.yml
- Check safety mechanisms (cooldown, max retries) aren't triggered
- Review Responder logs for error details

### Dashboard Not Updating

**Problem:** Streamlit shows no data

**Check:**

```bash
# View Streamlit logs
docker-compose logs -f hemostat-dashboard

# Verify Redis has data
docker exec hemostat-redis redis-cli GET "hemostat:stats:hemostat-test-api"

# Check dashboard can connect to Redis
docker exec hemostat-dashboard ping redis

# Refresh browser
# Streamlit auto-refreshes every 5 seconds
```

**Solutions:**

- Wait 30+ seconds for Monitor to collect first stats
- Manually refresh Streamlit page
- Check Redis is healthy and storing data
- Verify dashboard/app.py is reading correct Redis keys

### Alert Not Sending Slack

**Problem:** No Slack notifications despite fixes

**Check:**

```bash
# View Alert logs
docker-compose logs -f hemostat-alert

# Check Slack webhook URL is set
echo $SLACK_WEBHOOK_URL

# Test webhook manually
curl -X POST $SLACK_WEBHOOK_URL \
  -H 'Content-type: application/json' \
  -d '{
    "attachments": [{
      "color": "good",
      "title": "HemoStat Test",
      "text": "This is a test notification"
    }]
  }'
```

**Solutions:**

- Without Slack webhook, system still works (just no notifications)
- Verify webhook URL is correct and active
- Check Slack workspace permissions
- Review Alert logs for HTTP errors

### Performance Issues

**Problem:** System slow or laggy

**Check:**

```bash
# View system resources
docker stats

# Check individual service performance
docker-compose logs hemostat-monitor | grep "published"
docker-compose logs hemostat-analyzer | grep "Published"

# Monitor Redis performance
docker exec hemostat-redis redis-cli --stat
```

**Solutions:**

- Reduce Monitor polling interval (edit hemostat_monitor.py)
- Increase Docker resource limits (edit docker-compose.yml)
- Check Redis is not full: `docker exec hemostat-redis redis-cli INFO memory`
- Clear old events: `docker exec hemostat-redis redis-cli KEYS "hemostat:events:*" | xargs redis-cli DEL`

### Docker Permissions

**Problem:** Permission denied errors

**Check:**

```bash
# Check Docker socket permissions
ls -la /var/run/docker.sock

# Check if user is in docker group
groups $USER
```

**Solutions:**

```bash
# Add user to docker group (Linux)
sudo usermod -aG docker $USER
newgrp docker

# Restart Docker service
sudo systemctl restart docker

# Or run docker-compose with sudo
sudo docker-compose up -d
```

## Debug Mode

### Enable Verbose Logging

Edit each agent to add more logging:

```python
import logging
logging.basicConfig(level=logging.DEBUG)  # Change from INFO to DEBUG
```

### Monitor Redis Activity

```bash
# Watch all Redis events in real-time
docker exec hemostat-redis redis-cli SUBSCRIBE "hemostat:*"

# List all Redis keys
docker exec hemostat-redis redis-cli KEYS "*"

# View specific key value
docker exec hemostat-redis redis-cli GET "hemostat:stats:hemostat-test-api"

# Monitor Redis traffic
docker exec hemostat-redis redis-cli MONITOR
```

### Test Individual Components

```bash
# Test Monitor independently
docker run -it hemostat-agents-hemostat-monitor bash
python hemostat_monitor.py

# Test Docker SDK
docker run -it python:3.11 bash
pip install docker
python -c "import docker; c = docker.from_env(); print(c.containers.list())"

# Test Redis connection
docker run -it redis:7-alpine redis-cli -h redis ping
```

## Getting Help

1. **Check logs first:** Most issues show up in Docker logs
2. **Search TROUBLESHOOTING.md:** Common issues documented here
3. **Review docker-compose logs:** Full system trace
4. **Check GitHub issues:** If this was forked from a repo
5. **Ask AI/LLM:** Paste logs into Claude/ChatGPT for analysis

## Performance Tuning

### Monitor Polling Interval

```python
# In agents/hemostat_monitor/monitor.py
time.sleep(30)  # Change to 10 for faster detection, 60 for less CPU
```

### Analyzer Thresholds

```python
# In agents/hemostat_analyzer/analyzer.py
if memory_pct > 90:  # Lower for earlier alerts
if cpu_pct > 90:     # Lower for earlier alerts
```

### Responder Cooldown

```python
# In agents/hemostat_responder/responder.py
self.cooldown_period = 3600  # 1 hour, lower for more restarts
self.max_actions_per_hour = 3  # Increase for more remediation attempts
```

### Dashboard Refresh Rate

```python
# In dashboard/app.py - Streamlit auto-refreshes every 5 seconds
# To change, add to Streamlit config
```

## Advanced Debugging

### Network Inspection

```bash
# Check if agents can communicate
docker exec hemostat-monitor ping hemostat-redis
docker exec hemostat-analyzer ping hemostat-redis

# Test inter-container connectivity
docker network inspect hemostat-agents_default
```

### Resource Limits

```bash
# View resource usage
docker stats hemostat-monitor
docker stats hemostat-analyzer
docker stats hemostat-responder

# Set resource limits in docker-compose.yml
# See docker-compose.yml for examples
```

## Collecting Debug Information

If you're still stuck, capture the output of:

```bash
docker-compose logs > hemostat-debug.log
docker ps > containers.log
docker-compose ps > services.log
```

Review the logs carefully for error messages!