Webhook Ingestion support for query parameters

Prev Next

MITM Proxy Setup

Background

The DataBee HTTP collector is a webhook where datasources can send data. DataBee expects key parameters such as APIkey, tenantname, datasourceid specified in the HTTP header. Some data sources such as GitHub sends these additional parameters using HTTP query parameters.

The purpose of this guide is to provide a solution where additional infrastructure can be deployed to transform these parameters into a format that DataBee expects.

Steps

  • Install and run mitmproxy on an Ubuntu host. This document shows how to

    • install MITM Proxy using pipx

    • configure systemd service

    • run mitmproxy/mitmdump in reverse-proxy mode on your on-prem VM making it reachable on TCP 443 (HTTPS)

  • Configure mitmdump with a small Python addon that adds the required headers and forwards GitHub webhook requests to the DataBee HTTP collector endpoint.

What you need (checklist)

  • GitHub Enterprise Cloud (test datasource)

  • mitmproxy installed on Ubuntu VM

  • Public DNS name pointing to your network’s public IP (e.g. proxy.cdsys.io)

  • Router/firewall rule forwarding TCP 443 to the mitmproxy VM

  • TLS cert + key for that public DNS (Let’s Encrypt certbot or other CA)

  • The HTTP collector endpoint URL and the header values: TenantID, Authorization (API key), DatasourceID, Content-Type (application/json)

Assumptions / environment

  • Ubuntu (22.04 / 24.04 compatible)

  • You have a non-root service user (example: serviceuser) with a normal home directory (/home/serviceuser)

  • Ubuntu host IP: 10.60.0.200 (replace with your host IP throughout)

  • mitmproxy will use proxy port 8080 and web UI port 8081. Adjust ports if necessary

  1. Step-by-step: mitmproxy server config

  • Pick a public DNS name and make it point to your public IP

    • Example: proxy.cdsys.io -> PUBLIC_IP

  • Port forwarding / firewall

    • Forward public TCP 443 → internal_IP:8443 for the mitmproxy VM

  • Obtain TLS certificate for proxy.cdsys.io

    • Copy TLS certificate on the mitmproxy VM. You need fullchain.pem and privkey.pem for the domain.

  • Create a mitmproxy addon to add headers and forward

    • Create python script (shown below)  and replace ALL placeholder values

  • Create Github Enterprise Cloud Audit Feed on Databee platform and get the required headers values using HTTP ingestion method.

  1. Prerequisites for mitmproxy installation

  • update packages

    • sudo apt update && sudo apt upgrade -y

  • install Python tooling (if not present)

    • sudo apt install -y python3 python3-venv python3-pip

    • sudo python3 --version

  • install pipx with non-root user(recommended)

    • sudo apt install pipx –y

  1. Install mitmproxy (as the service user)

  • pipx install mitmproxy

  • pipx ensurepath

  • ~/.local/bin/mitmproxy --version

  • exit and relogin

  • export PATH="$PATH:/home/serviceuser/.local/bin - to set path manually

  • source ~/.bashrc - reload shell – if you have set path manually

  • which mitmproxy

  • sudo ufw allow 8080,8081,8443,443/tcp

  • sudo ufw reload

  • mitmproxy  -  and then quit

  1. Certificate location (Ubuntu)

  • mitmproxy generates and stores root CA files in the service user's home directory

    • ls -l /home/serviceuser/.mitmproxy/

mitmproxy-ca-cert.cer

mitmproxy-ca-cert.p12

mitmproxy-ca-cert.pem

mitmproxy-ca.pem

  • sudo cp ~/.mitmproxy/mitmproxy-ca-cert.pem /usr/local/share/ca-certificates/mitmproxy-ca.crt.cer

  • sudo update-ca-certificates

  1. Create directory and grant permission for python script and public certificate (replace with your original values highlighted in black)

  • sudo mkdir -p /opt/mitm

  • sudo vi /opt/mitm/databee_forwarder.py

Copy the content below and make the necessary changes.

# /opt/mitm/production_github_wrapper.py
from mitmproxy import http, ctx
import os
from urllib.parse import urlparse, urlencode
import json
import threading

# Config (override via env vars)
TENANT_ID = os.getenv("TENANT_ID", "crest-sf")
API_KEY = os.getenv("API_KEY", "7264da9f-b88c-49b6-8f4d-55b45c61e9b6")
DATASOURCE_ID = os.getenv("DATASOURCE_ID", "github_enterprise_cloud_audit-42f8153a-2413-4da4-b7c8-04877727881f")
DEST_BASE = os.getenv("DEST_BASE", "https://stg-test-api.us-staging.databee.buzz")

_parsed_dest = urlparse(DEST_BASE)

def _build_qs(query):
    try:
        items = list(query.items(multi=True))
        if not items:
            return ""
        return urlencode(items, doseq=True)
    except Exception:
        try:
            return str(query)
        except Exception:
            return ""

def _wrap_for_collector_array(flow: http.HTTPFlow):
    """
    Transform GitHub webhook body to collector format and return bytes containing a JSON array:
      [ { "api":"github", "event":..., "delivery":..., "payload": ... } ]
    """
    try:
        original_json = json.loads(flow.request.get_text())
    except Exception as e:
        ctx.log.warn(f"Failed to parse JSON body for wrapping: {e}")
        original_json = None

    wrapped = {
        "api": "github",
        "event": flow.request.headers.get("X-GitHub-Event", "unknown"),
        "delivery": flow.request.headers.get("X-GitHub-Delivery", "")
    }
    if original_json is None:
        # keep raw text if parse failed
        wrapped["payload_raw"] = flow.request.get_text()
    else:
        wrapped["payload"] = original_json

    # Important: collector expects an array of events => wrap as array
    arr = [wrapped]
    try:
        return json.dumps(arr).encode("utf-8")
    except Exception as e:
        ctx.log.warn(f"Failed to json.dumps wrapped array: {e}")
        return json.dumps([{"api":"github","event":wrapped.get("event"),"delivery":wrapped.get("delivery")}]).encode("utf-8")

def _async_forward_to_collector(body_bytes: bytes, forward_url: str, delivery_id: str):
    """
    Send a POST to the collector using a minimal, canonical header set.
    Logs full collector response body on error for debugging.
    """
    import urllib.request, urllib.error
    # build canonical headers (capitalize keys like successful curl)
    forward_headers = {
        "TenantID": TENANT_ID,
        "Authorization": API_KEY,
        "DataSourceID": DATASOURCE_ID,
        "Content-Type": "application/json",
        "Host": _parsed_dest.hostname or ""
    }

    # Log what we're sending (mask Authorization)
    masked_auth = (API_KEY[:4] + "..." + API_KEY[-4:]) if API_KEY else ""
    try:
        ctx.log.info(f"[ASYNC SEND] delivery={delivery_id or '[no-id]'} url={forward_url} headers={{'TenantID':TENANT_ID,'Authorization':{masked_auth},'DataSourceID':DATASOURCE_ID}} body_snippet={body_bytes[:800]!r}")
    except Exception:
        pass

    req = urllib.request.Request(forward_url, data=body_bytes, headers=forward_headers, method="POST")
    try:
        with urllib.request.urlopen(req, timeout=30) as resp:
            status = resp.getcode()
            resp_body = resp.read().decode("utf-8", errors="replace")
            ctx.log.info(f"[ASYNC OK] delivery={delivery_id or '[no-id]'} status={status} body_snippet={resp_body[:1000]}")
    except urllib.error.HTTPError as e:
        # read collector error body and log it fully (bounded)
        try:
            err_body = e.read().decode("utf-8", errors="replace")
        except Exception:
            err_body = "<no-body>"
        ctx.log.warn(f"[ASYNC ERROR] delivery={delivery_id or '[no-id]'} HTTPError status={getattr(e,'code',None)} body={err_body[:8000]}")
    except Exception as e:
        ctx.log.warn(f"[ASYNC ERROR] delivery={delivery_id or '[no-id]'} Exception: {e}")

def request(flow: http.HTTPFlow) -> None:
    path = flow.request.path or ""
    if not path.startswith("/http/ingest"):
        return

    if flow.request.method.upper() != "POST":
        ctx.log.info(f"Ignoring non-POST {flow.request.method} at {path}")
        return

    delivery = flow.request.headers.get("X-GitHub-Delivery", "[no-id]")
    event = flow.request.headers.get("X-GitHub-Event", "[unknown]")

    # Log incoming headers/body (truncated)
    try:
        ctx.log.info(f"[INCOMING] delivery={delivery} event={event} path={path}")
        ctx.log.info(f"[INCOMING HEADERS] {dict(flow.request.headers)}")
        ctx.log.info(f"[INCOMING BODY] {flow.request.get_text()[:1000]}{'...<truncated>' if len(flow.request.get_text())>1000 else ''}")
    except Exception:
        pass

    # Build the collector body (array)
    new_body_bytes = _wrap_for_collector_array(flow)

    # Prepare canonical headers for forwarded request (capitalized)
    # Also set Host to collector host
    flow.request.headers["TenantID"] = TENANT_ID
    flow.request.headers["Authorization"] = API_KEY
    flow.request.headers["DataSourceID"] = DATASOURCE_ID
    flow.request.headers["Content-Type"] = "application/json"
    if _parsed_dest.hostname:
        flow.request.headers["Host"] = _parsed_dest.hostname

    # Build sanitized query string (keep any non-sensitive params)
    qs = _build_qs(flow.request.query)
    pure_path = flow.request.path.split("?", 1)[0]
    new_url = DEST_BASE.rstrip("/") + pure_path
    if qs:
        new_url += "?" + qs
    else:
        # ensure ?v=1 if your collector expects it; add only if not present
        if "?v=1" not in flow.request.path:
            new_url += "?v=1"

    ctx.log.info(f"[FORWARDING PREP] delivery={delivery} -> {new_url}")
    ctx.log.info(f"[OUTGOING HEADERS] {{'TenantID':TENANT_ID,'Authorization':'{API_KEY[:4]}...','DataSourceID':DATASOURCE_ID,'Content-Type':'application/json','Host':'{_parsed_dest.hostname}'}}")

    # Respond 202 to GitHub immediately
    try:
        flow.response = http.Response.make(
            202,
            b'{"message": "Accepted for processing!"}',
            {"Content-Type": "application/json"}
        )
        ctx.log.info(f"[REPLIED] 202 to GitHub for delivery={delivery}")
    except Exception as e:
        ctx.log.warn(f"Failed to send immediate 202: {e}")

    # Start async forward using the prepared body and URL
    try:
        threading.Thread(target=_async_forward_to_collector, args=(new_body_bytes, new_url, delivery), daemon=True).start()
        ctx.log.info(f"[ASYNC STARTED] delivery={delivery} -> {new_url}")
    except Exception as e:
        ctx.log.warn(f"Failed to start async forward: {e}")

def response(flow: http.HTTPFlow) -> None:
    try:
        code = flow.response.status_code
        snippet = flow.response.content[:500] if flow.response.content else b""
        snippet_text = snippet.decode("utf-8", errors="replace")
        ctx.log.info(f"[FLOW RESPONSE] status={code} snippet={snippet_text}")
    except Exception as e:
        ctx.log.warn(f"Error logging response: {e}")
  • Save and exit.

  • sudo mkdir -p /etc/mitmproxy

  • sudo mv /home/serviceuser/fullchain.pem /etc/mitmproxy/ (merge your public fullchain.pem + private key and copy to this location)

  • sudo chown -R serviceuser:serviceuser /etc/mitmproxy

  • sudo chmod 600 /etc/mitmproxy/fullchain.pem

9. Create systemd service – replace with your original value highlighted in black

  • sudo vi /etc/systemd/system/mitmdump-proxy.service

[Unit]
Description=mitmdump reverse proxy for databee
After=network.target
Wants=network-online.target
 
[Service]
Type=simple
User=serviceuser
Group=serviceuser
WorkingDirectory=/home/serviceuser
Environment=PYTHONUNBUFFERED=1
# allow binding to low ports if you ever change to a privileged port (optional)
AmbientCapabilities=CAP_NET_BIND_SERVICE
# full command (no trailing backslashes)
ExecStart=/home/serviceuser/.local/bin/mitmdump --mode "reverse:https://stg-test-api.us-staging.databee.buzz/" --listen-host 0.0.0.0 --listen-port 8443 --set confdir=/etc/mitmproxy --set tls_ciphers_client='ECDHE+AESGCM:!RC4:!MD5:!DES:!3DES' --certs "proxy.cdsys.io=/etc/mitmproxy/fullchain.pem" -s /opt/mitm/databee_forwarder.py
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
StandardOutput=journal
StandardError=journal
 
[Install]
WantedBy=multi-user.target
  • Save and exit

Run the following commands

  • sudo systemctl daemon-reload

  • sudo systemctl enable mitmdump-proxy.service

  • sudo systemctl start mitmdump-proxy.service

  • sudo systemctl status mitmdump-proxy.service

  • sudo journalctl -u mitmdump-proxy.service -f

10. GitHub side (Webhook configuration)

  • In your GitHub repo (or org) -> Settings -> Webhooks -> Add webhook.

    • Payload URL: https://proxy.cdsys.io/http/injest/v=1

    • Content type: application/json

    • Secret: keep it blank

    • Which events: choose “Just the push event” for tests, then expand later.

    • Active: checked.

  • Save. Click "Recent deliveries" & "Redeliver" to test.

  • Important: GitHub expects HTTPS and a valid certificate trusted by public CAs. That's why we used a real cert for proxy.cdsys.io

  • Ref: Creating webhooks - GitHub Docs