Methodology · Ghost Permits

The overview

Ghost Permits fuses three independent data streams to produce timestamped, reproducible evidence of unpermitted industrial combustion. The streams are physically distinct — atmospheric chemistry, optical reflectance, and municipal paperwork — and none of them see each other until the final fusion step. That independence is the point: if a permit filing and the atmosphere disagree, only one of them is subject to human editing.

The satellite archive is the primary source. Everything else — VLM counts, permit filings, our narrative — exists to make that archive legible.

Atmospheric signal

Instrument

Copernicus Sentinel-5P TROPOMI, in orbit since 2017. Measures tropospheric column densities of NO₂, HCHO, SO₂, CH₄ and CO every day at nadir resolution of 3.5 × 5.5 km (post-August 2019). We retrieve L2 offline products via the ESA Copernicus Data Space STAC API.

Signal of interest

NO₂ — the primary fingerprint of high-temperature combustion. Every gas turbine produces it.
HCHO — formaldehyde. Elevated when methane combustion is incomplete or uncontrolled.
SO₂ — secondary marker for diesel backup generators.

Baseline construction

For each site, we compute a 2019–2023 pre-industrial baseline from cloud-filtered (QA > 0.75) readings in a 0.25° bounding box centred on the facility. Baseline is the mean column density over those five years; the alert threshold is 30% above baseline, conservative by design.

Anomaly onset

We define anomaly onset as the first day the 14-day rolling mean crosses the alert threshold. This is the single date we timestamp the atmospheric record against — it is the dataset's version of a witness testifying under oath.

Parameter	Value
Product	sentinel-5p-l2-no2-offl
Bounding box	0.25° × 0.25° centred on facility
QA gate	qa_value > 0.75 & cloud_fraction < 0.5
Baseline window	2019-01-01 → 2023-12-31
Alert threshold	+30% over baseline mean
Onset rule	14-day rolling mean ≥ threshold

Visual signal (LFM2.5-VL)

Instrument

Copernicus Sentinel-2 L2A at 10-metre ground sample distance, fetched via the free Element84 earth-search STAC. We composite RGB from B04/B03/B02 plus a SWIR thermal channel from B11/B12 for heat-signature verification.

Model

We run LiquidAI/LFM2.5-VL-450M locally via HuggingFace transformers. The 450M-parameter vision-language model is small enough for on-orbit inference on DPhi-class hardware — designed to run anywhere from a laptop to a satellite payload — and capable enough to reason about industrial equipment in 10-metre pixels.

Fine-tuning

We fine-tune from the VRSBench baseline (123k satellite VQA pairs) with an additional industrial-facility dataset covering gas turbines, data-centre cooling infrastructure, and construction-phase imagery. Training recipe:

git clone https://github.com/Liquid4All/leap-finetune/
cd leap-finetune
uv run leap-finetune ./ghost_permits_config.yaml
# ~2 hours on a single H100

Inference outputs

Turbine count — integer estimate with model confidence in 0–1.
Heat signature — boolean, derived from the SWIR composite.
Construction phase — one of: bare, prep, foundations, equipment, operational.
Bounding boxes — per-unit rectangles in scene coordinates.

Permit records

We pull filings directly from the authoritative public source for each jurisdiction. For the Memphis case, that is the Tennessee Department of Environment & Conservation (TDEC) and the Mississippi Department of Environmental Quality (MDEQ). National cross-checks come from the EPA ECHO database.

For each facility we record: applicant name, filing date, permit reference, units covered, and status. The gap is computed as the difference in days between the atmospheric onset date and the first filing that names the address.

Fusion & confidence score

The confidence score is a 0–100 sum across four components. Each component is bounded so no single stream can dominate, and the score is capped at 100.

Component	Max	Basis
Atmospheric	40	Peak anomaly %, days above threshold, HCHO corroboration
Visual (VLM)	25	Units detected, confidence, heat-signature boolean
Permit gap	25	Days unpermitted, unpermitted unit count
Corroboration	10	Independent third-party record (press, aerial survey, lawsuit)

A score of 70+ corresponds to our verdict “Strong evidence of unpermitted emissions.” Lower scores are not exonerations — they mean the pipeline found the signal insufficient for a public finding.

Limits & caveats

Resolution. TROPOMI averages over ~19 km². Nearby sources can contribute to the measured column. We counter this by comparing pre- and post-operational means on the same pixel — the differential is the contribution attributable to the new source.
Cloud cover. Memphis has ~30% cloudy days per year. We exclude those and interpolate only for visualisation, never for the onset date itself.
VLM hallucinations. LFM2.5-VL, like all VLMs, can produce confident-sounding descriptions that are wrong. We therefore treat the VLM's count as a range, and corroborate with aerial or ground imagery wherever possible.
Permit complexity. Some jurisdictions issue multiple overlapping permits. We record the first filing that names the physical address; we do not interpret whether amendments retroactively cover earlier operation.
Attribution. A NO₂ plume is not proof of a specific operator. We rely on the coincident visual and permit signals to establish that link.

Reproduce this

The pipeline is open-source and runs end-to-end on a laptop. No API keys required — everything uses free, public data.

git clone https://github.com/ghostpermits/ghost-permits
cd ghost-permits
pip install -r requirements.txt

# Run the Memphis case
python pipeline.py --site colossus_1 --start 2023-01-01
# → writes evidence_brief.json

# Or any coordinate on Earth
python pipeline.py \
  --lat 35.0577 --lon -90.1534 \
  --name "My site" \
  --operational-since 2024-07-01

The web dashboard (python app.py) serves the Sentinel console at localhost:8000. The VLM (~900 MB) auto-downloads from HuggingFace on first run.

Citing this work

If Ghost Permits appears in reporting, briefs, or filings, please cite it as:

Ghost Permits Team (2026). Ghost Permits v1.0 —
Unpermitted Industrial Emissions Intelligence via Satellite.
DataOil St. / Built with LiquidAI LFM2.5-VL on DPhi Space infrastructure.
https://ghost-permits.vercel.app

For the Memphis case specifically, reference brief CLX-01, reanalysis date 16 April 2026.

Methodology · v1.0 · published 18 Apr 2026 · Apache-2.0 · open for correction

How we turn orbital physics into legal evidence.

Instrument

Signal of interest

Baseline construction

Anomaly onset

Instrument

Model

Fine-tuning

Inference outputs