01 — The overview
Ghost Permits fuses three independent data streams to produce timestamped, reproducible evidence of unpermitted industrial combustion. The streams are physically distinct — atmospheric chemistry, optical reflectance, and municipal paperwork — and none of them see each other until the final fusion step. That independence is the point: if a permit filing and the atmosphere disagree, only one of them is subject to human editing.
02 — Atmospheric signal
Instrument
Copernicus Sentinel-5P TROPOMI, in orbit since 2017. Measures tropospheric column densities of NO₂, HCHO, SO₂, CH₄ and CO every day at nadir resolution of 3.5 × 5.5 km (post-August 2019). We retrieve L2 offline products via the ESA Copernicus Data Space STAC API.
Signal of interest
- NO₂ — the primary fingerprint of high-temperature combustion. Every gas turbine produces it.
- HCHO — formaldehyde. Elevated when methane combustion is incomplete or uncontrolled.
- SO₂ — secondary marker for diesel backup generators.
Baseline construction
For each site, we compute a 2019–2023 pre-industrial baseline from cloud-filtered (QA > 0.75) readings in a 0.25° bounding box centred on the facility. Baseline is the mean column density over those five years; the alert threshold is 30% above baseline, conservative by design.
Anomaly onset
We define anomaly onset as the first day the 14-day rolling mean crosses the alert threshold. This is the single date we timestamp the atmospheric record against — it is the dataset's version of a witness testifying under oath.
| Parameter | Value |
|---|---|
| Product | sentinel-5p-l2-no2-offl |
| Bounding box | 0.25° × 0.25° centred on facility |
| QA gate | qa_value > 0.75 & cloud_fraction < 0.5 |
| Baseline window | 2019-01-01 → 2023-12-31 |
| Alert threshold | +30% over baseline mean |
| Onset rule | 14-day rolling mean ≥ threshold |
03 — Visual signal (LFM2.5-VL)
Instrument
Copernicus Sentinel-2 L2A at 10-metre ground sample distance, fetched via the free Element84 earth-search STAC. We composite RGB from B04/B03/B02 plus a SWIR thermal channel from B11/B12 for heat-signature verification.
Model
We run LiquidAI/LFM2.5-VL-450M locally via HuggingFace transformers. The 450M-parameter vision-language model is small enough for on-orbit inference on DPhi-class hardware — designed to run anywhere from a laptop to a satellite payload — and capable enough to reason about industrial equipment in 10-metre pixels.
Fine-tuning
We fine-tune from the VRSBench baseline (123k satellite VQA pairs) with an additional industrial-facility dataset covering gas turbines, data-centre cooling infrastructure, and construction-phase imagery. Training recipe:
git clone https://github.com/Liquid4All/leap-finetune/ cd leap-finetune uv run leap-finetune ./ghost_permits_config.yaml # ~2 hours on a single H100
Inference outputs
- Turbine count — integer estimate with model confidence in 0–1.
- Heat signature — boolean, derived from the SWIR composite.
- Construction phase — one of: bare, prep, foundations, equipment, operational.
- Bounding boxes — per-unit rectangles in scene coordinates.
04 — Permit records
We pull filings directly from the authoritative public source for each jurisdiction. For the Memphis case, that is the Tennessee Department of Environment & Conservation (TDEC) and the Mississippi Department of Environmental Quality (MDEQ). National cross-checks come from the EPA ECHO database.
For each facility we record: applicant name, filing date, permit reference, units covered, and status. The gap is computed as the difference in days between the atmospheric onset date and the first filing that names the address.
05 — Fusion & confidence score
The confidence score is a 0–100 sum across four components. Each component is bounded so no single stream can dominate, and the score is capped at 100.
| Component | Max | Basis |
|---|---|---|
| Atmospheric | 40 | Peak anomaly %, days above threshold, HCHO corroboration |
| Visual (VLM) | 25 | Units detected, confidence, heat-signature boolean |
| Permit gap | 25 | Days unpermitted, unpermitted unit count |
| Corroboration | 10 | Independent third-party record (press, aerial survey, lawsuit) |
A score of 70+ corresponds to our verdict “Strong evidence of unpermitted emissions.” Lower scores are not exonerations — they mean the pipeline found the signal insufficient for a public finding.
06 — Limits & caveats
- Resolution. TROPOMI averages over ~19 km². Nearby sources can contribute to the measured column. We counter this by comparing pre- and post-operational means on the same pixel — the differential is the contribution attributable to the new source.
- Cloud cover. Memphis has ~30% cloudy days per year. We exclude those and interpolate only for visualisation, never for the onset date itself.
- VLM hallucinations. LFM2.5-VL, like all VLMs, can produce confident-sounding descriptions that are wrong. We therefore treat the VLM's count as a range, and corroborate with aerial or ground imagery wherever possible.
- Permit complexity. Some jurisdictions issue multiple overlapping permits. We record the first filing that names the physical address; we do not interpret whether amendments retroactively cover earlier operation.
- Attribution. A NO₂ plume is not proof of a specific operator. We rely on the coincident visual and permit signals to establish that link.
07 — Reproduce this
The pipeline is open-source and runs end-to-end on a laptop. No API keys required — everything uses free, public data.
git clone https://github.com/ghostpermits/ghost-permits cd ghost-permits pip install -r requirements.txt # Run the Memphis case python pipeline.py --site colossus_1 --start 2023-01-01 # → writes evidence_brief.json # Or any coordinate on Earth python pipeline.py \ --lat 35.0577 --lon -90.1534 \ --name "My site" \ --operational-since 2024-07-01
The web dashboard (python app.py) serves the Sentinel console at localhost:8000. The VLM (~900 MB) auto-downloads from HuggingFace on first run.
08 — Citing this work
If Ghost Permits appears in reporting, briefs, or filings, please cite it as:
Ghost Permits Team (2026). Ghost Permits v1.0 — Unpermitted Industrial Emissions Intelligence via Satellite. DataOil St. / Built with LiquidAI LFM2.5-VL on DPhi Space infrastructure. https://ghost-permits.vercel.app
For the Memphis case specifically, reference brief CLX-01, reanalysis date 16 April 2026.