Methodology · v1.0 · April 2026

How we turn orbital physics into legal evidence.

Every number on this site is produced by a reproducible pipeline. No proprietary data, no closed models, no hand-tuned site lists. This is the full recipe.

01 — The overview

Ghost Permits fuses three independent data streams to produce timestamped, reproducible evidence of unpermitted industrial combustion. The streams are physically distinct — atmospheric chemistry, optical reflectance, and municipal paperwork — and none of them see each other until the final fusion step. That independence is the point: if a permit filing and the atmosphere disagree, only one of them is subject to human editing.

The satellite archive is the primary source. Everything else — VLM counts, permit filings, our narrative — exists to make that archive legible.

02 — Atmospheric signal

Instrument

Copernicus Sentinel-5P TROPOMI, in orbit since 2017. Measures tropospheric column densities of NO₂, HCHO, SO₂, CH₄ and CO every day at nadir resolution of 3.5 × 5.5 km (post-August 2019). We retrieve L2 offline products via the ESA Copernicus Data Space STAC API.

Signal of interest

Baseline construction

For each site, we compute a 2019–2023 pre-industrial baseline from cloud-filtered (QA > 0.75) readings in a 0.25° bounding box centred on the facility. Baseline is the mean column density over those five years; the alert threshold is 30% above baseline, conservative by design.

Anomaly onset

We define anomaly onset as the first day the 14-day rolling mean crosses the alert threshold. This is the single date we timestamp the atmospheric record against — it is the dataset's version of a witness testifying under oath.

ParameterValue
Productsentinel-5p-l2-no2-offl
Bounding box0.25° × 0.25° centred on facility
QA gateqa_value > 0.75 & cloud_fraction < 0.5
Baseline window2019-01-01 → 2023-12-31
Alert threshold+30% over baseline mean
Onset rule14-day rolling mean ≥ threshold

03 — Visual signal (LFM2.5-VL)

Instrument

Copernicus Sentinel-2 L2A at 10-metre ground sample distance, fetched via the free Element84 earth-search STAC. We composite RGB from B04/B03/B02 plus a SWIR thermal channel from B11/B12 for heat-signature verification.

Model

We run LiquidAI/LFM2.5-VL-450M locally via HuggingFace transformers. The 450M-parameter vision-language model is small enough for on-orbit inference on DPhi-class hardware — designed to run anywhere from a laptop to a satellite payload — and capable enough to reason about industrial equipment in 10-metre pixels.

Fine-tuning

We fine-tune from the VRSBench baseline (123k satellite VQA pairs) with an additional industrial-facility dataset covering gas turbines, data-centre cooling infrastructure, and construction-phase imagery. Training recipe:

git clone https://github.com/Liquid4All/leap-finetune/
cd leap-finetune
uv run leap-finetune ./ghost_permits_config.yaml
# ~2 hours on a single H100

Inference outputs

04 — Permit records

We pull filings directly from the authoritative public source for each jurisdiction. For the Memphis case, that is the Tennessee Department of Environment & Conservation (TDEC) and the Mississippi Department of Environmental Quality (MDEQ). National cross-checks come from the EPA ECHO database.

For each facility we record: applicant name, filing date, permit reference, units covered, and status. The gap is computed as the difference in days between the atmospheric onset date and the first filing that names the address.

05 — Fusion & confidence score

The confidence score is a 0–100 sum across four components. Each component is bounded so no single stream can dominate, and the score is capped at 100.

ComponentMaxBasis
Atmospheric40Peak anomaly %, days above threshold, HCHO corroboration
Visual (VLM)25Units detected, confidence, heat-signature boolean
Permit gap25Days unpermitted, unpermitted unit count
Corroboration10Independent third-party record (press, aerial survey, lawsuit)

A score of 70+ corresponds to our verdict “Strong evidence of unpermitted emissions.” Lower scores are not exonerations — they mean the pipeline found the signal insufficient for a public finding.

06 — Limits & caveats

07 — Reproduce this

The pipeline is open-source and runs end-to-end on a laptop. No API keys required — everything uses free, public data.

git clone https://github.com/ghostpermits/ghost-permits
cd ghost-permits
pip install -r requirements.txt

# Run the Memphis case
python pipeline.py --site colossus_1 --start 2023-01-01
# → writes evidence_brief.json

# Or any coordinate on Earth
python pipeline.py \
  --lat 35.0577 --lon -90.1534 \
  --name "My site" \
  --operational-since 2024-07-01

The web dashboard (python app.py) serves the Sentinel console at localhost:8000. The VLM (~900 MB) auto-downloads from HuggingFace on first run.

08 — Citing this work

If Ghost Permits appears in reporting, briefs, or filings, please cite it as:

Ghost Permits Team (2026). Ghost Permits v1.0 —
Unpermitted Industrial Emissions Intelligence via Satellite.
DataOil St. / Built with LiquidAI LFM2.5-VL on DPhi Space infrastructure.
https://ghost-permits.vercel.app

For the Memphis case specifically, reference brief CLX-01, reanalysis date 16 April 2026.

Methodology · v1.0 · published 18 Apr 2026 · Apache-2.0 · open for correction