ASN-and-FQDN-IP-Collector/README.md

# RIPE AS CIDR & FQDN IP Collector

This project collects CIDR prefixes for specified Autonomous Systems (AS) from the RIPE NCC API and resolves IP addresses for specified FQDNs. It accumulates these addresses over time, maintaining a history of discovered prefixes. It also provides a FastAPI-based HTTP interface to retrieve the collected data.

## 1. Preparation and Installation

### Prerequisites
- Python 3.8+
- `pip` and `venv`

### Installation Steps
1.  **Clone the repository** (or copy the files) to your desired location, e.g., `/opt/ripe_collector`.
    ```bash
    mkdir -p /opt/ripe_collector
    cd /opt/ripe_collector
    # Copy files: cidr_collector.py, api_server.py, requirements.txt, config.json
    ```

2.  **Create a Virtual Environment**:
    ```bash
    python3 -m venv venv
    ```

3.  **Install Dependencies**:
    ```bash
    source venv/bin/activate
    pip install -r requirements.txt
    deactivate
    ```

4.  **Initial Configuration**:
    Edit `config.json` to set your initial ASNs and FQDNs.
    ```json
    {
        "asns": [62041],
        "fqdns": ["google.com"]
    }
    ```

---

## 2. Running the Collector (Periodic Task)

The collector script `cidr_collector.py` is designed to run once per day to fetch updates.

### Manual Run
```bash
/opt/ripe_collector/venv/bin/python3 /opt/ripe_collector/cidr_collector.py run
```

### Setup Cron Job (Recommended)
To run daily at 02:00 AM:

1.  Open crontab:
    ```bash
    crontab -e
    ```
2.  Add the line:
    ```cron
    0 2 * * * /opt/ripe_collector/venv/bin/python3 /opt/ripe_collector/cidr_collector.py run >> /var/log/ripe_collector.log 2>&1
    ```

---

## 3. Application Setup: Systemd (Ubuntu, Debian)

This section describes how to run the **API Server** (`api_server.py`) as a system service.

### Create Service File
Create `/etc/systemd/system/ripe-api.service`:

```ini
[Unit]
Description=RIPE CIDR Collector API
After=network.target

[Service]
User=root
# Change User=root to a generic user if desired, ensure they have write access to data.json/fqdn_data.json
WorkingDirectory=/opt/ripe_collector
ExecStart=/opt/ripe_collector/venv/bin/uvicorn api_server:app --host 0.0.0.0 --port 8000
Restart=always

[Install]
WantedBy=multi-user.target
```

### Enable and Start
```bash
# Reload systemd
sudo systemctl daemon-reload

# Enable service to start on boot
sudo systemctl enable ripe-api

# Start service immediately
sudo systemctl start ripe-api

# Check status
sudo systemctl status ripe-api
```

---

## 4. Application Setup: RC-Script (Alpine Linux)

For Alpine Linux using OpenRC.

### Create Init Script
Create `/etc/init.d/ripe-api`:

```sh
#!/sbin/openrc-run

name="ripe-api"
description="RIPE CIDR Collector API"
command="/opt/ripe_collector/venv/bin/uvicorn"
# --host and --port and module:app passed as arguments
command_args="api_server:app --host 0.0.0.0 --port 8000"
command_background="yes"
pidfile="/run/${RC_SVCNAME}.pid"
directory="/opt/ripe_collector"

depend() {
    need net
}
```

### Make Executable
```bash
chmod +x /etc/init.d/ripe-api
```

### Enable and Start
```bash
# Add to default runlevel
rc-update add ripe-api default

# Start service
service ripe-api start

# Check status
service ripe-api status
```

---

## 5. API Usage Documentation

The API runs by default on port `8000`. It allows retrieving the collected data in a flat JSON list.

### Base URL
`http://<server-ip>:8000`

### Endpoint: Get Addresses
**GET** `/addresses`

Retrieves the list of collected IP addresses/CIDRs.

| Parameter | Type | Required | Default | Description |
| :--- | :--- | :--- | :--- | :--- |
| `type` | string | No | `all` | Filter by source type. Options: `cidr` (ASNs only), `fqdn` (Domains only), `all` (Both). |

#### Example 1: Get All Addresses (Default)

**Request:**
```bash
curl -X GET "http://localhost:8000/addresses"
```

**Response (JSON):**
```json
[
  "142.250.1.1",
  "149.154.160.0/22",
  "149.154.160.0/23",
  "2001:4860:4860::8888",
  "91.108.4.0/22"
]
```

#### Example 2: Get Only CIDRs (from ASNs)

**Request:**
```bash
curl -X GET "http://localhost:8000/addresses?type=cidr"
```

**Response (JSON):**
```json
[
  "149.154.160.0/22",
  "149.154.160.0/23",
  "91.108.4.0/22"
]
```

#### Example 3: Get Only Resolved IPs (from FQDNs)

**Request:**
```bash
curl -X GET "http://localhost:8000/addresses?type=fqdn"
```

**Response (JSON):**
```json
[
  "142.250.1.1",
  "2001:4860:4860::8888"
]
```

### Endpoint: Manage Schedule
**GET** `/schedule`
Returns the current cron schedules.

**POST** `/schedule`
Updates the schedule for a specific collector type.
Body:
```json
{
    "type": "asn",
    "cron": "*/15 * * * *"
}
```
*Note: `type` can be `asn` or `fqdn`.*

---

## 6. Advanced CLI Usage

The collector script supports running modes independently:

```bash
# Run both (Default)
python3 cidr_collector.py run

# Run only ASN collection
python3 cidr_collector.py run --mode asn

# Run only FQDN collection
python3 cidr_collector.py run --mode fqdn
```

---

## 7. Internal Logic & Architecture

### Collector Logic
When the collector runs (whether manually or via schedule):
1.  **Instantiation**: Creates a new instance of `CIDRCollector` or `FQDNCollector`. This forces a fresh read of `config.json`, ensuring any added ASNs/FQDNs are immediately processed.
2.  **Fetching**:
    *   **ASN**: Queries RIPE NCC API (`stat.ripe.net`).
    *   **FQDN**: Uses Python's `socket.getaddrinfo` to resolve A and AAAA records.
3.  **Comparison**: Reads existing `data.json`/`fqdn_data.json`. It compares the fetched set with the stored set.
4.  **Accumulation**: It effectively performs a Union operation (Old U New).
    *   **If new items found**: The list is updated, sorting is applied, and `last_updated` timestamp is refreshed for that specific resource.
    *   **If no new items**: The file is untouched.
5.  **Persistence**: Checks are performed to ensure data is only written to disk if changes actually occurred.

### Scheduler Logic
The `api_server.py` uses `APScheduler` (BackgroundScheduler).

1.  **Startup**: When the server starts (`uvicorn`), `start_scheduler` is called. It loads the `schedule` block from `config.json` and creates two independent jobs (`asn_job`, `fqdn_job`).
2.  **Runtime Updates (POST /schedule)**:
    *   The server validates the new cron expression.
    *   It updates `config.json` so the change survives restarts.
    *   It calls `scheduler.add_job(..., replace_existing=True)`. This hot-swaps the trigger for the running job.
3.  **Concurrency**: If a scheduled job is already running when a new schedule is posted, the running job completes normally. The new schedule applies to the *next* calculated run time.