Files
OpenVPN-Monitoring-Simple/DOCS/data_gathering_report.md

2.5 KiB

OpenVPN Data Gatherer Analysis

This report details the internal mechanics of openvpn_gatherer_v3.py, responsible for collecting, processing, and storing OpenVPN usage metrics.

1. Raw Data Collection (run_monitoring_cycle)

The gatherer runs a continuous loop (default interval: 10 seconds).

A. Log Parsing

  • Source: /var/log/openvpn/openvpn-status.log (Status File v2, CSV format).
  • Target Fields: Common Name, Real Address, Bytes Sent, Bytes Received.
  • Filtering: Only lines starting with CLIENT_LIST.

B. Delta Calculation (update_client_status_and_bytes)

The log provides lifetime counters for a session. The gatherer calculates the traffic delta (increment) since the last check.

  • Logic: Increment = Current Value - Last Saved Value.
  • Reset Detection: If Current Value < Last Saved Value, it assumes a session/server restart and counts the entire Current Value as the increment.

C. Rate Calculation

  • Speed: Calculated as Increment * 8 / (Interval * 1_000_000) to get Mbps.
  • Storage: Raw samples (10s resolution) including speed and volume are stored in the usage_history table.

2. Data Aggregation (TSDB)

To support long-term statistics without storing billions of rows, the TimeSeriesAggregator performs real-time rollups into 5 aggregated tables using an UPSERT strategy (Insert or update sum).

Table Resolution Timestamp Alignment Retention (Default)
usage_history 10 sec Exact time 7 Days
stats_5min 5 min 00:00, 00:05... 14 Days
stats_15min 15 min 00:00, 00:15... 28 Days
stats_hourly 1 Hour XX:00:00 90 Days
stats_6h 6 Hours 00:00, 06:00, 12:00... 180 Days
stats_daily 1 Day 00:00:00 365 Days

Logic: Every 10s cycle, the calculated Increment is added to the sum of all relevant overlapping buckets. A single 5MB download contributes immediately to the current 5min, 15min, Hourly, 6h, and Daily counters simultaneously.

3. Data Retention

A cleanup job runs once every 24 hours (on day change).

  • It executes DELETE FROM table WHERE timestamp < cutoff_date.
  • Thresholds are configurable in config.ini under [retention].

Summary

The system employs a "Write-Optimized" approach. Instead of calculating heavy aggregates on-read (which would be slow), it pre-calculates them on-write. This ensures instant dashboard loading times even with years of historical data.