Connection Pool & Optimization¶
RustSocks includes an efficient connection pool for upstream TCP connections, reducing connection establishment overhead and improving performance.
Overview¶
The connection pool reuses TCP connections to frequently accessed destinations, eliminating repeated TCP handshake overhead and improving throughput.
How It Works¶
- Pool Management: Idle upstream connections are stored per-destination
- Connection Reuse: When connecting to the same destination, pooled connections are reused
- Timeout Handling: Connections expire after
idle_timeout_secsof inactivity - Background Cleanup: Periodic cleanup task removes expired connections
- Capacity Limits: Both per-destination and global limits prevent resource exhaustion
Key Features¶
- ✅ LRU-style connection pooling with timeout management
- ✅ Per-destination and global connection limits
- ✅ Configurable idle timeout and connect timeout
- ✅ Background cleanup of expired connections
- ✅ Thread-safe implementation using DashMap and atomic counters
- ✅ Optional (disabled by default for backward compatibility)
- ✅ Zero-copy connection reuse
- ✅ Automatic eviction when limits are reached
Configuration¶
[server.pool]
enabled = true # Enable connection pooling
max_idle_per_dest = 4 # Max idle connections per destination
max_total_idle = 100 # Max total idle connections
idle_timeout_secs = 90 # Keep-alive duration
connect_timeout_ms = 5000 # Connection timeout
Configuration Parameters¶
enabled: Enable/disable the connection pool (default:false)max_idle_per_dest: Maximum idle connections per destination (default: 4)max_total_idle: Maximum total idle connections across all destinations (default: 100)idle_timeout_secs: How long to keep idle connections alive (default: 90 seconds)connect_timeout_ms: Timeout for establishing new connections (default: 5000ms)
Benefits¶
- Reduced Latency: Reusing connections eliminates TCP handshake overhead
- Savings depend on network latency and destination reuse
- Lower CPU Usage: Fewer connection establishments reduce CPU overhead
- Better Resource Utilization: Controlled connection limits prevent resource exhaustion
- Improved Throughput: Faster connection reuse for frequent destinations
Implementation Details¶
Location¶
src/server/pool.rs
Key Structures¶
// Main pool manager
pub struct ConnectionPool {
config: PoolConfig,
pools: Arc<DashMap<SocketAddr, Vec<PooledConnection>>>,
destination_metrics: Arc<DashMap<SocketAddr, DestinationMetrics>>,
metrics: Arc<PoolMetrics>,
active_counts: Arc<DashMap<SocketAddr, AtomicUsize>>,
}
// Wrapper with metadata
struct PooledConnection {
stream: TcpStream,
created_at: Instant,
last_used: Instant,
}
// Configuration parameters
pub struct PoolConfig {
pub enabled: bool,
pub max_idle_per_dest: usize,
pub max_total_idle: usize,
pub idle_timeout_secs: u64,
pub connect_timeout_ms: u64,
}
// Pool statistics API
pub struct PoolStats {
pub total_idle: usize,
pub destinations: usize,
pub total_created: u64,
pub total_reused: u64,
pub pool_hits: u64,
pub pool_misses: u64,
pub dropped_full: u64,
pub expired: u64,
pub evicted: u64,
pub connections_in_use: u64,
pub pending_creates: u64,
}
Integration¶
The connection pool is integrated into the handler via ConnectHandlerContext:
// In handler.rs
match connect_ctx.connection_pool.get(target).await {
Ok(stream) => stream,
Err(e) => return Err(e.into()),
};
Connection Lifecycle¶
- Get or Connect: Check pool for idle connection matching destination
- Validation: Verify connection is not closed/expired
- Reuse: Return pooled connection if valid
- New Connection: Establish new connection if pool empty or all expired
- Return to Pool: On connection close, return to pool if limits allow
- Cleanup: Background task periodically removes expired connections
Testing¶
Test Suite Overview¶
Note: Test counts change over time. Use cargo test -- --list for current totals.
# Run all pool tests
cargo test --all-features pool
# Run pool unit tests
cargo test --all-features --lib pool
# Run pool integration tests
cargo test --all-features --test connection_pool
# Run pool edge case tests
cargo test --all-features --test pool_edge_cases
# Run pool SOCKS integration tests
cargo test --all-features --test pool_socks_integration
# Run concurrency stress tests
cargo test --all-features --test pool_concurrency -- --ignored --nocapture
Test Coverage¶
- Basic integration (
connection_pool.rs): Connection reuse, timeout handling, disabled mode - Edge cases (
pool_edge_cases.rs): - Closed servers
- Expired connections
- Per-destination limits
- Global limits
- Stats accuracy
- Concurrent operations
- LIFO behavior
- Cleanup tasks
- SOCKS5 integration (
pool_socks_integration.rs): - Full SOCKS5 flows with pooling
- Error handling
- Stats reflection
- Stress tests (
pool_concurrency.rs): - 200-500 concurrent operations
- Contention benchmarks
Performance Under Load¶
Performance depends on your destination mix, idle timeouts, and reuse rate. For current results, run the load tests in loadtests/ and observe the pool metrics and telemetry.
What to Measure¶
- Reuse rate (
total_reusedvstotal_created) - Pool hits vs misses
- Evictions and expirations
- Connection latency under load
- Telemetry warnings for pool pressure
Why DashMap + Atomics¶
The pool uses per-destination sharding (DashMap) and atomic counters to avoid a global lock on hot paths while keeping statistics cheap to update.
Best Practices¶
When to Enable¶
Enable connection pooling when: - Clients repeatedly connect to same destinations - Network latency to destinations is significant (>5ms) - Connection establishment overhead is noticeable - You want to reduce CPU usage for connection setup
When to Disable¶
Consider disabling when: - Destinations are highly varied (low reuse rate) - Upstream servers close idle connections quickly - Memory is constrained (pool adds overhead) - Connections are long-lived (less benefit from pooling)
Tuning Guidelines¶
- Per-Destination Limit (
max_idle_per_dest): - Start with 4 for most workloads
- Increase to 8-16 for high-traffic destinations
-
Decrease to 2 if memory is constrained
-
Global Limit (
max_total_idle): - Set to
max_idle_per_dest × typical_destinations - Example: 4 × 50 = 200 for 50 frequent destinations
-
Monitor actual pool size with statistics API
-
Idle Timeout (
idle_timeout_secs): - Set lower than upstream server's idle timeout
- Typical values: 60-120 seconds
- Shorter timeout reduces stale connections
-
Longer timeout improves reuse rate
-
Connect Timeout (
connect_timeout_ms): - Balance between retry speed and failure detection
- Typical values: 3000-10000ms
- Increase for high-latency networks
- Decrease for low-latency, reliable networks
Monitoring¶
Use the pool statistics API to monitor performance (requires sessions.stats_api_enabled = true):
curl http://127.0.0.1:9090/api/pool/stats
Key fields in the response:
- total_idle, destinations, connections_in_use
- total_created, total_reused, pool_hits, pool_misses
- dropped_full, evicted, expired, pending_creates
Key metrics to watch:
- Reuse rate: total_reused / total_created should be high for benefit
- Pool utilization: total_idle / max_total_idle indicates capacity usage
- Per-destination distribution: Check if limits are hit frequently
The new operational telemetry endpoint (GET /api/telemetry/events) surfaces warnings
when connections are dropped because a per-destination cap was hit or when the global idle
limit forces an eviction. Pair that feed with the stats API for quick diagnostics in the dashboard.
Troubleshooting¶
Problem: Low reuse rate¶
Symptoms: total_created >> total_reused
Causes: - Destinations too varied (many unique destinations) - Idle timeout too short (connections expire before reuse) - Upstream servers closing connections
Solutions:
- Increase idle_timeout_secs
- Check upstream server keep-alive settings
- Monitor destination distribution
Problem: Connection failures after reuse¶
Symptoms: Errors immediately after get_or_connect()
Causes: - Upstream server closed connection while idle - Connection expired but not yet cleaned up
Solutions:
- Pool validates connections before reuse
- Decrease idle_timeout_secs to match upstream
- Check server logs for connection reset errors
Problem: High memory usage¶
Symptoms: Pool growing unbounded
Causes:
- max_total_idle set too high
- Many unique destinations (pool keeps connections for each)
- Cleanup task not running
Solutions:
- Reduce max_total_idle and max_idle_per_dest
- Monitor pool size with statistics API
- Verify cleanup task is running
Problem: Contention under heavy load¶
Symptoms: High CPU usage, poor throughput scaling
Causes: - Very high concurrency (1000+ simultaneous connections) - All connections to same destination (hot-spot contention)
Solutions:
- Profile with cargo flamegraph to confirm contention
- Consider sharding pool by destination hash
- Check if workload suits pooling (varied vs single destination)