MITM Proxy Setup
Background
The DataBee HTTP collector is a webhook where datasources can send data. DataBee expects key parameters such as APIkey, tenantname, datasourceid specified in the HTTP header. Some data sources such as GitHub sends these additional parameters using HTTP query parameters.
The purpose of this guide is to provide a solution where additional infrastructure can be deployed to transform these parameters into a format that DataBee expects.

Steps
Install and run mitmproxy on an Ubuntu host. This document shows how to
install MITM Proxy using pipx
configure systemd service
run mitmproxy/mitmdump in reverse-proxy mode on your on-prem VM making it reachable on TCP 443 (HTTPS)
Configure mitmdump with a small Python addon that adds the required headers and forwards GitHub webhook requests to the DataBee HTTP collector endpoint.
What you need (checklist)
GitHub Enterprise Cloud (test datasource)
mitmproxy installed on Ubuntu VM
Public DNS name pointing to your network’s public IP (e.g. proxy.cdsys.io)
Router/firewall rule forwarding TCP 443 to the mitmproxy VM
TLS cert + key for that public DNS (Let’s Encrypt certbot or other CA)
The HTTP collector endpoint URL and the header values: TenantID, Authorization (API key), DatasourceID, Content-Type (application/json)
Assumptions / environment
Ubuntu (22.04 / 24.04 compatible)
You have a non-root service user (example: serviceuser) with a normal home directory (/home/serviceuser)
Ubuntu host IP: 10.60.0.200 (replace with your host IP throughout)
mitmproxy will use proxy port 8080 and web UI port 8081. Adjust ports if necessary
Step-by-step: mitmproxy server config
Pick a public DNS name and make it point to your public IP
Example: proxy.cdsys.io -> PUBLIC_IP
Port forwarding / firewall
Forward public TCP 443 → internal_IP:8443 for the mitmproxy VM
Obtain TLS certificate for proxy.cdsys.io
Copy TLS certificate on the mitmproxy VM. You need fullchain.pem and privkey.pem for the domain.
Create a mitmproxy addon to add headers and forward
Create python script (shown below) and replace ALL placeholder values
Create Github Enterprise Cloud Audit Feed on Databee platform and get the required headers values using HTTP ingestion method.
Prerequisites for mitmproxy installation
update packages
sudo apt update && sudo apt upgrade -y
install Python tooling (if not present)
sudo apt install -y python3 python3-venv python3-pip
sudo python3 --version
install pipx with non-root user(recommended)
sudo apt install pipx –y
Install mitmproxy (as the service user)
pipx install mitmproxy
pipx ensurepath
~/.local/bin/mitmproxy --version
exit and relogin
export PATH="$PATH:/home/serviceuser/.local/bin - to set path manually
source ~/.bashrc - reload shell – if you have set path manually
which mitmproxy
sudo ufw allow 8080,8081,8443,443/tcp
sudo ufw reload
mitmproxy - and then quit
Certificate location (Ubuntu)
mitmproxy generates and stores root CA files in the service user's home directory
ls -l /home/serviceuser/.mitmproxy/
mitmproxy-ca-cert.cer
mitmproxy-ca-cert.p12
mitmproxy-ca-cert.pem
mitmproxy-ca.pem
sudo cp ~/.mitmproxy/mitmproxy-ca-cert.pem /usr/local/share/ca-certificates/mitmproxy-ca.crt.cer
sudo update-ca-certificates
Create directory and grant permission for python script and public certificate (replace with your original values highlighted in black)
sudo mkdir -p /opt/mitm
sudo vi /opt/mitm/databee_forwarder.py
Copy the content below and make the necessary changes.
# /opt/mitm/production_github_wrapper.py
from mitmproxy import http, ctx
import os
from urllib.parse import urlparse, urlencode
import json
import threading
# Config (override via env vars)
TENANT_ID = os.getenv("TENANT_ID", "crest-sf")
API_KEY = os.getenv("API_KEY", "7264da9f-b88c-49b6-8f4d-55b45c61e9b6")
DATASOURCE_ID = os.getenv("DATASOURCE_ID", "github_enterprise_cloud_audit-42f8153a-2413-4da4-b7c8-04877727881f")
DEST_BASE = os.getenv("DEST_BASE", "https://stg-test-api.us-staging.databee.buzz")
_parsed_dest = urlparse(DEST_BASE)
def _build_qs(query):
try:
items = list(query.items(multi=True))
if not items:
return ""
return urlencode(items, doseq=True)
except Exception:
try:
return str(query)
except Exception:
return ""
def _wrap_for_collector_array(flow: http.HTTPFlow):
"""
Transform GitHub webhook body to collector format and return bytes containing a JSON array:
[ { "api":"github", "event":..., "delivery":..., "payload": ... } ]
"""
try:
original_json = json.loads(flow.request.get_text())
except Exception as e:
ctx.log.warn(f"Failed to parse JSON body for wrapping: {e}")
original_json = None
wrapped = {
"api": "github",
"event": flow.request.headers.get("X-GitHub-Event", "unknown"),
"delivery": flow.request.headers.get("X-GitHub-Delivery", "")
}
if original_json is None:
# keep raw text if parse failed
wrapped["payload_raw"] = flow.request.get_text()
else:
wrapped["payload"] = original_json
# Important: collector expects an array of events => wrap as array
arr = [wrapped]
try:
return json.dumps(arr).encode("utf-8")
except Exception as e:
ctx.log.warn(f"Failed to json.dumps wrapped array: {e}")
return json.dumps([{"api":"github","event":wrapped.get("event"),"delivery":wrapped.get("delivery")}]).encode("utf-8")
def _async_forward_to_collector(body_bytes: bytes, forward_url: str, delivery_id: str):
"""
Send a POST to the collector using a minimal, canonical header set.
Logs full collector response body on error for debugging.
"""
import urllib.request, urllib.error
# build canonical headers (capitalize keys like successful curl)
forward_headers = {
"TenantID": TENANT_ID,
"Authorization": API_KEY,
"DataSourceID": DATASOURCE_ID,
"Content-Type": "application/json",
"Host": _parsed_dest.hostname or ""
}
# Log what we're sending (mask Authorization)
masked_auth = (API_KEY[:4] + "..." + API_KEY[-4:]) if API_KEY else ""
try:
ctx.log.info(f"[ASYNC SEND] delivery={delivery_id or '[no-id]'} url={forward_url} headers={{'TenantID':TENANT_ID,'Authorization':{masked_auth},'DataSourceID':DATASOURCE_ID}} body_snippet={body_bytes[:800]!r}")
except Exception:
pass
req = urllib.request.Request(forward_url, data=body_bytes, headers=forward_headers, method="POST")
try:
with urllib.request.urlopen(req, timeout=30) as resp:
status = resp.getcode()
resp_body = resp.read().decode("utf-8", errors="replace")
ctx.log.info(f"[ASYNC OK] delivery={delivery_id or '[no-id]'} status={status} body_snippet={resp_body[:1000]}")
except urllib.error.HTTPError as e:
# read collector error body and log it fully (bounded)
try:
err_body = e.read().decode("utf-8", errors="replace")
except Exception:
err_body = "<no-body>"
ctx.log.warn(f"[ASYNC ERROR] delivery={delivery_id or '[no-id]'} HTTPError status={getattr(e,'code',None)} body={err_body[:8000]}")
except Exception as e:
ctx.log.warn(f"[ASYNC ERROR] delivery={delivery_id or '[no-id]'} Exception: {e}")
def request(flow: http.HTTPFlow) -> None:
path = flow.request.path or ""
if not path.startswith("/http/ingest"):
return
if flow.request.method.upper() != "POST":
ctx.log.info(f"Ignoring non-POST {flow.request.method} at {path}")
return
delivery = flow.request.headers.get("X-GitHub-Delivery", "[no-id]")
event = flow.request.headers.get("X-GitHub-Event", "[unknown]")
# Log incoming headers/body (truncated)
try:
ctx.log.info(f"[INCOMING] delivery={delivery} event={event} path={path}")
ctx.log.info(f"[INCOMING HEADERS] {dict(flow.request.headers)}")
ctx.log.info(f"[INCOMING BODY] {flow.request.get_text()[:1000]}{'...<truncated>' if len(flow.request.get_text())>1000 else ''}")
except Exception:
pass
# Build the collector body (array)
new_body_bytes = _wrap_for_collector_array(flow)
# Prepare canonical headers for forwarded request (capitalized)
# Also set Host to collector host
flow.request.headers["TenantID"] = TENANT_ID
flow.request.headers["Authorization"] = API_KEY
flow.request.headers["DataSourceID"] = DATASOURCE_ID
flow.request.headers["Content-Type"] = "application/json"
if _parsed_dest.hostname:
flow.request.headers["Host"] = _parsed_dest.hostname
# Build sanitized query string (keep any non-sensitive params)
qs = _build_qs(flow.request.query)
pure_path = flow.request.path.split("?", 1)[0]
new_url = DEST_BASE.rstrip("/") + pure_path
if qs:
new_url += "?" + qs
else:
# ensure ?v=1 if your collector expects it; add only if not present
if "?v=1" not in flow.request.path:
new_url += "?v=1"
ctx.log.info(f"[FORWARDING PREP] delivery={delivery} -> {new_url}")
ctx.log.info(f"[OUTGOING HEADERS] {{'TenantID':TENANT_ID,'Authorization':'{API_KEY[:4]}...','DataSourceID':DATASOURCE_ID,'Content-Type':'application/json','Host':'{_parsed_dest.hostname}'}}")
# Respond 202 to GitHub immediately
try:
flow.response = http.Response.make(
202,
b'{"message": "Accepted for processing!"}',
{"Content-Type": "application/json"}
)
ctx.log.info(f"[REPLIED] 202 to GitHub for delivery={delivery}")
except Exception as e:
ctx.log.warn(f"Failed to send immediate 202: {e}")
# Start async forward using the prepared body and URL
try:
threading.Thread(target=_async_forward_to_collector, args=(new_body_bytes, new_url, delivery), daemon=True).start()
ctx.log.info(f"[ASYNC STARTED] delivery={delivery} -> {new_url}")
except Exception as e:
ctx.log.warn(f"Failed to start async forward: {e}")
def response(flow: http.HTTPFlow) -> None:
try:
code = flow.response.status_code
snippet = flow.response.content[:500] if flow.response.content else b""
snippet_text = snippet.decode("utf-8", errors="replace")
ctx.log.info(f"[FLOW RESPONSE] status={code} snippet={snippet_text}")
except Exception as e:
ctx.log.warn(f"Error logging response: {e}")Save and exit.
sudo mkdir -p /etc/mitmproxy
sudo mv /home/serviceuser/fullchain.pem /etc/mitmproxy/ (merge your public fullchain.pem + private key and copy to this location)
sudo chown -R serviceuser:serviceuser /etc/mitmproxy
sudo chmod 600 /etc/mitmproxy/fullchain.pem
9. Create systemd service – replace with your original value highlighted in black
sudo vi /etc/systemd/system/mitmdump-proxy.service
[Unit]
Description=mitmdump reverse proxy for databee
After=network.target
Wants=network-online.target
[Service]
Type=simple
User=serviceuser
Group=serviceuser
WorkingDirectory=/home/serviceuser
Environment=PYTHONUNBUFFERED=1
# allow binding to low ports if you ever change to a privileged port (optional)
AmbientCapabilities=CAP_NET_BIND_SERVICE
# full command (no trailing backslashes)
ExecStart=/home/serviceuser/.local/bin/mitmdump --mode "reverse:https://stg-test-api.us-staging.databee.buzz/" --listen-host 0.0.0.0 --listen-port 8443 --set confdir=/etc/mitmproxy --set tls_ciphers_client='ECDHE+AESGCM:!RC4:!MD5:!DES:!3DES' --certs "proxy.cdsys.io=/etc/mitmproxy/fullchain.pem" -s /opt/mitm/databee_forwarder.py
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.targetSave and exit
Run the following commands
sudo systemctl daemon-reload
sudo systemctl enable mitmdump-proxy.service
sudo systemctl start mitmdump-proxy.service
sudo systemctl status mitmdump-proxy.service
sudo journalctl -u mitmdump-proxy.service -f
10. GitHub side (Webhook configuration)
In your GitHub repo (or org) -> Settings -> Webhooks -> Add webhook.
Payload URL: https://proxy.cdsys.io/http/injest/v=1
Content type: application/json
Secret: keep it blank
Which events: choose “Just the push event” for tests, then expand later.
Active: checked.
Save. Click "Recent deliveries" & "Redeliver" to test.
Important: GitHub expects HTTPS and a valid certificate trusted by public CAs. That's why we used a real cert for proxy.cdsys.io