Session Management¶

This document describes the session tracking, persistence, and statistics features in RustSocks.

Overview¶

RustSocks provides comprehensive session management including: - Real-time session tracking - SQLite persistence (optional) - Traffic metrics and statistics - Prometheus metrics integration (optional) - REST API for session queries

Architecture¶

In-Memory Storage¶

// Active sessions
DashMap<String, Session>  // Concurrent hashmap, lock-free reads

// Completed sessions (snapshots)
RwLock<Vec<Session>>      // Read-heavy workload

Benefits: - Lock-free concurrent access for active sessions - Efficient lookups without blocking - Minimal write contention

Session Lifecycle¶

Creation (new_session()):
Generate UUID session ID
Store in active sessions map
Initialize traffic counters
Traffic Updates (update_traffic()):
Called periodically during proxy loop
Increment bytes/packets counters
Reduces write amplification
Closure (close_session()):
Mark session as completed
Record duration and close reason
Move to completed snapshots
Queue for database write (if enabled)
Rejection Tracking (track_rejected_session()):
Record ACL-blocked connections
Store matched rule and decision
Useful for audit and troubleshooting

Database Persistence¶

Feature Flag: database

SQLite Backend¶

Uses sqlx for async SQLite operations: - Connection pooling - Async migrations - Type-safe queries - Transaction support

Schema¶

CREATE TABLE sessions (
    session_id TEXT PRIMARY KEY,
    user TEXT NOT NULL,
    start_time TEXT NOT NULL,
    end_time TEXT,
    duration_secs INTEGER,
    source_ip TEXT NOT NULL,
    source_port INTEGER NOT NULL,
    dest_ip TEXT NOT NULL,
    dest_port INTEGER NOT NULL,
    protocol TEXT NOT NULL,
    bytes_sent INTEGER NOT NULL,
    bytes_received INTEGER NOT NULL,
    packets_sent INTEGER NOT NULL,
    packets_received INTEGER NOT NULL,
    status TEXT NOT NULL,
    close_reason TEXT,
    acl_rule_matched TEXT,
    acl_decision TEXT
);

CREATE INDEX idx_sessions_user ON sessions(user);
CREATE INDEX idx_sessions_start_time ON sessions(start_time);
CREATE INDEX idx_sessions_status ON sessions(status);

Safety Hardening¶

To protect the on-disk database: - On startup RustSocks verifies the SQLite header. If the file looks corrupt, it is quarantined (sessions.db.corrupt.<timestamp>) and a fresh database is created. - Before running migrations a timestamped snapshot (sessions.db.bak.<timestamp>) is created so you can roll back schema mistakes quickly. - After migrations, PRAGMA integrity_check is executed. A failed check automatically moves the broken file aside and recreates a healthy database. - Journaling now auto-detects whether the filesystem supports WAL. If shared-memory files cannot be created the server transparently falls back to the safer DELETE journal. In safety-first mode we force PRAGMA synchronous = FULL for maximal durability.

Together these steps ensure a hostile environment (read-only mounts, sudden power loss, partial writes) cannot silently destroy sessions.db.

Batch Writer¶

Efficient batch writing reduces database overhead:

[sessions]
batch_size = 100           # Sessions per batch
batch_interval_ms = 1000   # Max time between flushes

Algorithm: 1. Queue sessions in memory 2. Flush when: - Batch size reached, OR - Interval elapsed, OR - Shutdown initiated 3. Single transaction per batch 4. Background task handles writes

Benefits: - Reduced write I/O (100x fewer transactions) - Lower contention on database - Graceful degradation on database failure

Cleanup Task¶

Automatic cleanup of old records:

[sessions]
retention_days = 90           # Keep sessions for 90 days
cleanup_interval_hours = 24   # Run cleanup daily

Algorithm: 1. Run periodically (configurable interval) 2. Delete sessions older than retention period 3. Use index on start_time for efficiency 4. Runs in background, non-blocking

Traffic Tracking¶

Configuration¶

[sessions]
traffic_update_packet_interval = 10

Determines how often update_traffic() is called during proxying.

Trade-offs: - Lower value (e.g., 1-10): More accurate real-time stats, higher overhead - Higher value (e.g., 50-100): Less overhead, delayed stats updates

Implementation¶

// In proxy.rs
let mut packet_count = 0;

loop {
    tokio::select! {
        // ... data transfer ...
    }

    packet_count += 1;
    if packet_count >= update_interval {
        session_manager.update_traffic(session_id, bytes_sent, bytes_received);
        packet_count = 0;
    }
}

// Final flush on connection close
session_manager.update_traffic(session_id, bytes_sent, bytes_received);

Statistics API¶

Rolling Window Aggregation¶

pub fn get_stats(&self, window_hours: u64) -> SessionStats {
    // Aggregate sessions from last N hours
    // Returns:
    // - Active session count
    // - Total sessions/bytes
    // - Top users (by bandwidth)
    // - Top destinations (by sessions)
}

HTTP Endpoint¶

GET /api/sessions/stats?window_hours=48

Response:

{
  "window_hours": 48,
  "active_sessions": 42,
  "total_sessions": 15234,
  "total_bytes_sent": 523423123,
  "total_bytes_received": 234234234,
  "top_users": [
    {"user": "alice", "sessions": 523, "bytes": 523423123},
    {"user": "bob", "sessions": 234, "bytes": 234234234}
  ],
  "top_destinations": [
    {"dest": "example.com:443", "sessions": 1234},
    {"dest": "api.github.com:443", "sessions": 456}
  ]
}

Operational Telemetry¶

RustSocks buffers short-lived operational events alongside the rolling metrics history. These events currently capture: - Connection pool pressure (drops and evictions when the per-destination or global caps are hit) - Connection failures when the upstream cannot be reached

Use the [telemetry] config block to control retention:

[telemetry]
enabled = true              # Enable the telemetry buffer
max_events = 256            # Maximum events cached in memory
retention_hours = 6         # How long events stay in the buffer

Fetch events via the new REST endpoint:

GET /api/telemetry/events?minutes=60&limit=50

Optional query parameters: - minutes: look back this many minutes (default: full buffer) - limit: maximum number of events returned (default 100, max 500) - severity: filter by info, warning, or error - category: filter by event category (currently connection_pool)

Each event includes a timestamp, severity, category, message, and optional details such as destination addresses or pool limits. Combine it with the metrics history endpoint in the dashboard to show a small “Operational Telemetry” feed for rapid troubleshooting.

Prometheus Metrics¶

Feature Flag: metrics

Note: The /metrics endpoint is served by the stats API server, so sessions.stats_api_enabled must be true.

Available Metrics¶

# Active session gauge
rustsocks_active_sessions

# Total sessions counter
rustsocks_sessions_total

# Rejected sessions counter
rustsocks_sessions_rejected_total

# Session duration histogram
rustsocks_session_duration_seconds (buckets: 0.1, 0.5, 1, 5, 10, 30, 60, 300)

# Traffic counters
rustsocks_bytes_sent_total
rustsocks_bytes_received_total

# Per-user metrics
rustsocks_user_sessions_total{user="alice"}
rustsocks_user_bandwidth_bytes_total{user="alice", direction="sent"}
rustsocks_user_bandwidth_bytes_total{user="alice", direction="received"}

Integration¶

Metrics are automatically updated: - On session creation - On traffic updates - On session closure - On ACL rejection

Querying¶

# Prometheus scrape endpoint
curl http://127.0.0.1:9090/metrics

# Example queries
# Average session duration
rate(rustsocks_session_duration_seconds_sum[5m]) / rate(rustsocks_session_duration_seconds_count[5m])

# Bandwidth by user
rate(rustsocks_user_bandwidth_bytes_total{user="alice"}[5m])

# Rejection rate
rate(rustsocks_sessions_rejected_total[5m]) / rate(rustsocks_sessions_total[5m])

Configuration¶

Complete Example¶

[sessions]
enabled = true
storage = "sqlite"  # or "memory" / "mariadb" / "mysql"

# Database settings
database_url = "sqlite://data/sessions.db"
batch_size = 100
batch_interval_ms = 1000
retention_days = 90
cleanup_interval_hours = 24

# Traffic tracking
traffic_update_packet_interval = 10

# Statistics API
stats_api_enabled = true
stats_api_bind_address = "127.0.0.1"
stats_api_port = 9090
stats_window_hours = 24

Database Operations¶

Running Migrations¶

Migrations are applied automatically on startup:

# Manual migration (for testing)
sqlx migrate run --database-url sqlite://sessions.db

Migrations are located in migrations/ directory.

Querying Session Data¶

-- Active sessions
SELECT user, dest_ip, dest_port, bytes_sent, bytes_received
FROM sessions WHERE status = 'active';

-- Rejected by ACL
SELECT user, dest_ip, dest_port, acl_rule_matched
FROM sessions WHERE status = 'rejected_by_acl';

-- Top users by traffic
SELECT user,
       SUM(bytes_sent + bytes_received) as total_bytes,
       COUNT(*) as sessions
FROM sessions
GROUP BY user
ORDER BY total_bytes DESC
LIMIT 10;

-- Sessions in last hour
SELECT * FROM sessions
WHERE datetime(start_time) >= datetime('now', '-1 hour');

-- Traffic over time (hourly buckets)
SELECT
    strftime('%Y-%m-%d %H:00', start_time) as hour,
    COUNT(*) as sessions,
    SUM(bytes_sent + bytes_received) as total_bytes
FROM sessions
WHERE datetime(start_time) >= datetime('now', '-24 hours')
GROUP BY hour
ORDER BY hour;

Backup and Export¶

# Backup SQLite database
sqlite3 sessions.db ".backup sessions_backup.db"

# Export to CSV
sqlite3 sessions.db -csv -header "SELECT * FROM sessions" > sessions.csv

# Export specific query
sqlite3 sessions.db -csv -header \
  "SELECT user, dest_ip, dest_port, bytes_sent, bytes_received FROM sessions" \
  > traffic_report.csv

Performance Characteristics¶

Operation	Latency	Notes
Session creation	<100µs	In-memory only
Traffic update	<50µs	DashMap lock-free read
Session closure	<200µs	Includes snapshot
Database batch write	5-50ms	100 sessions per batch
Stats query (24h window)	10-100ms	Depends on session count

Memory Usage¶

Approximate memory per session: - Active session: ~500 bytes (in DashMap) - Snapshot: ~300 bytes (in Vec) - Database record: ~200 bytes (on disk)

For 10,000 active sessions: - In-memory: ~5 MB - With snapshots: ~8 MB - Database: ~2 MB (compressed)

Best Practices¶

When to Enable Database Persistence¶

Enable SQLite persistence when: - Need audit trail of all connections - Want historical analysis/reporting - Compliance requires session logs - Using statistics API extensively

When to Use Memory-Only Mode¶

Use memory-only mode when: - Temporary/development deployments - Privacy requirements (no persistence) - High throughput (>10k sessions/sec) - Limited disk space

Tuning Guidelines¶

Batch Size (batch_size):
Larger batches: Better throughput, higher latency
Smaller batches: Lower latency, more I/O overhead
Recommended: 50-200 for most workloads
Batch Interval (batch_interval_ms):
Lower interval: More real-time updates, more writes
Higher interval: Better batching, delayed persistence
Recommended: 1000-5000ms
Traffic Update Interval (traffic_update_packet_interval):
Lower: More accurate real-time stats, higher CPU
Higher: Lower overhead, delayed updates
Recommended: 10-50 packets
Retention Period (retention_days):
Balance compliance requirements with disk space
Monitor database size growth
Consider archiving to external storage

Troubleshooting¶

Problem: High database write latency¶

Symptoms: Slow session closures, growing batch queue

Solutions: - Increase batch_size (fewer transactions) - Increase batch_interval_ms (better batching) - Check disk I/O performance - Consider moving database to faster storage

Problem: Memory usage growing¶

Symptoms: High memory consumption, OOM crashes

Causes: - Too many completed session snapshots - Batch writer queue overflow - Database connection failure

Solutions: - Limit snapshot retention (not implemented yet) - Check database connectivity - Monitor batch writer queue size - Reduce traffic_update_packet_interval

Problem: Statistics API slow¶

Symptoms: High latency on /api/sessions/stats

Causes: - Large number of sessions in window - Complex aggregation queries - Database not indexed

Solutions: - Reduce stats_window_hours - Ensure indexes exist on database - Consider caching statistics - Use Prometheus instead for aggregations

Session Management¶

Overview¶

Architecture¶

In-Memory Storage¶

Session Lifecycle¶

Database Persistence¶

SQLite Backend¶

Schema¶

Safety Hardening¶

Batch Writer¶

Cleanup Task¶

Traffic Tracking¶

Configuration¶

Implementation¶

Statistics API¶

Rolling Window Aggregation¶

HTTP Endpoint¶

Operational Telemetry¶

Prometheus Metrics¶

Available Metrics¶

Integration¶

Querying¶

Configuration¶

Complete Example¶

Database Operations¶

Running Migrations¶

Querying Session Data¶

Backup and Export¶

Performance Characteristics¶

Memory Usage¶

Best Practices¶

When to Enable Database Persistence¶

When to Use Memory-Only Mode¶

Tuning Guidelines¶

Troubleshooting¶

Problem: High database write latency¶

Problem: Memory usage growing¶

Problem: Statistics API slow¶

Related Documentation¶