# RIPE AS CIDR & FQDN IP Collector This project collects CIDR prefixes for specified Autonomous Systems (AS) from the RIPE NCC API and resolves IP addresses for specified FQDNs. It accumulates these addresses over time, maintaining a history of discovered prefixes. It also provides a FastAPI-based HTTP interface to retrieve the collected data. ## 1. Preparation and Installation ### Prerequisites - Python 3.8+ - `pip` and `venv` ### Installation Steps 1. **Clone the repository** (or copy the files) to your desired location, e.g., `/opt/ripe_collector`. ```bash mkdir -p /opt/ripe_collector cd /opt/ripe_collector # Copy files: cidr_collector.py, api_server.py, requirements.txt, config.json ``` 2. **Create a Virtual Environment**: ```bash python3 -m venv venv ``` 3. **Install Dependencies**: ```bash source venv/bin/activate pip install -r requirements.txt deactivate ``` 4. **Initial Configuration**: Edit `config.json` to set your initial ASNs and FQDNs. ```json { "asns": [62041], "fqdns": ["google.com"] } ``` --- ## 2. Running the Collector (Periodic Task) The collector script `cidr_collector.py` is designed to run once per day to fetch updates. ### Manual Run ```bash /opt/ripe_collector/venv/bin/python3 /opt/ripe_collector/cidr_collector.py run ``` ### Setup Cron Job (Recommended) To run daily at 02:00 AM: 1. Open crontab: ```bash crontab -e ``` 2. Add the line: ```cron 0 2 * * * /opt/ripe_collector/venv/bin/python3 /opt/ripe_collector/cidr_collector.py run >> /var/log/ripe_collector.log 2>&1 ``` --- ## 3. Application Setup: Systemd (Ubuntu, Debian) This section describes how to run the **API Server** (`api_server.py`) as a system service. ### Create Service File Create `/etc/systemd/system/ripe-api.service`: ```ini [Unit] Description=RIPE CIDR Collector API After=network.target [Service] User=root # Change User=root to a generic user if desired, ensure they have write access to data.json/fqdn_data.json WorkingDirectory=/opt/ripe_collector ExecStart=/opt/ripe_collector/venv/bin/uvicorn api_server:app --host 0.0.0.0 --port 8000 Restart=always [Install] WantedBy=multi-user.target ``` ### Enable and Start ```bash # Reload systemd sudo systemctl daemon-reload # Enable service to start on boot sudo systemctl enable ripe-api # Start service immediately sudo systemctl start ripe-api # Check status sudo systemctl status ripe-api ``` --- ## 4. Application Setup: RC-Script (Alpine Linux) For Alpine Linux using OpenRC. ### Create Init Script Create `/etc/init.d/ripe-api`: ```sh #!/sbin/openrc-run name="ripe-api" description="RIPE CIDR Collector API" command="/opt/ripe_collector/venv/bin/uvicorn" # --host and --port and module:app passed as arguments command_args="api_server:app --host 0.0.0.0 --port 8000" command_background="yes" pidfile="/run/${RC_SVCNAME}.pid" directory="/opt/ripe_collector" depend() { need net } ``` ### Make Executable ```bash chmod +x /etc/init.d/ripe-api ``` ### Enable and Start ```bash # Add to default runlevel rc-update add ripe-api default # Start service service ripe-api start # Check status service ripe-api status ``` --- ## 5. API Usage Documentation The API runs by default on port `8000`. It allows retrieving the collected data in a flat JSON list. ### Base URL `http://:8000` ### Endpoint: Get Addresses **GET** `/addresses` Retrieves the list of collected IP addresses/CIDRs. | Parameter | Type | Required | Default | Description | | :--- | :--- | :--- | :--- | :--- | | `type` | string | No | `all` | Filter by source type. Options: `cidr` (ASNs only), `fqdn` (Domains only), `all` (Both). | #### Example 1: Get All Addresses (Default) **Request:** ```bash curl -X GET "http://localhost:8000/addresses" ``` **Response (JSON):** ```json [ "142.250.1.1", "149.154.160.0/22", "149.154.160.0/23", "2001:4860:4860::8888", "91.108.4.0/22" ] ``` #### Example 2: Get Only CIDRs (from ASNs) **Request:** ```bash curl -X GET "http://localhost:8000/addresses?type=cidr" ``` **Response (JSON):** ```json [ "149.154.160.0/22", "149.154.160.0/23", "91.108.4.0/22" ] ``` #### Example 3: Get Only Resolved IPs (from FQDNs) **Request:** ```bash curl -X GET "http://localhost:8000/addresses?type=fqdn" ``` **Response (JSON):** ```json [ "142.250.1.1", "2001:4860:4860::8888" ] ``` ### Endpoint: Manage Schedule **GET** `/schedule` Returns the current cron schedules. **POST** `/schedule` Updates the schedule for a specific collector type. Body: ```json { "type": "asn", "cron": "*/15 * * * *" } ``` *Note: `type` can be `asn` or `fqdn`.* --- ## 6. Advanced CLI Usage The collector script supports running modes independently: ```bash # Run both (Default) python3 cidr_collector.py run # Run only ASN collection python3 cidr_collector.py run --mode asn # Run only FQDN collection python3 cidr_collector.py run --mode fqdn ``` --- ## 7. Internal Logic & Architecture ### Collector Logic When the collector runs (whether manually or via schedule): 1. **Instantiation**: Creates a new instance of `CIDRCollector` or `FQDNCollector`. This forces a fresh read of `config.json`, ensuring any added ASNs/FQDNs are immediately processed. 2. **Fetching**: * **ASN**: Queries RIPE NCC API (`stat.ripe.net`). * **FQDN**: Uses Python's `socket.getaddrinfo` to resolve A and AAAA records. 3. **Comparison**: Reads existing `data.json`/`fqdn_data.json`. It compares the fetched set with the stored set. 4. **Accumulation**: It effectively performs a Union operation (Old U New). * **If new items found**: The list is updated, sorting is applied, and `last_updated` timestamp is refreshed for that specific resource. * **If no new items**: The file is untouched. 5. **Persistence**: Checks are performed to ensure data is only written to disk if changes actually occurred. ### Scheduler Logic The `api_server.py` uses `APScheduler` (BackgroundScheduler). 1. **Startup**: When the server starts (`uvicorn`), `start_scheduler` is called. It loads the `schedule` block from `config.json` and creates two independent jobs (`asn_job`, `fqdn_job`). 2. **Runtime Updates (POST /schedule)**: * The server validates the new cron expression. * It updates `config.json` so the change survives restarts. * It calls `scheduler.add_job(..., replace_existing=True)`. This hot-swaps the trigger for the running job. 3. **Concurrency**: If a scheduled job is already running when a new schedule is posted, the running job completes normally. The new schedule applies to the *next* calculated run time.