SensingDetectionTesting

Validating Detection vs Staged Li-Ion Tests

By Engineering — Sensing · June 15, 2026 · 8 min read

A detection claim is only as good as the test it survives. Here is the protocol — staged cell-level abuse, replayable data, no cherry-picked runs.

Validating thermal anomaly detection means proving two things separately: that it catches real lithium-ion events early, and that it does not trip on the much larger population of non-events. Both have to be measured against staged, repeatable tests — not anecdotes from a single dramatic run. The honest version of the claim is a detection rate and a false-positive rate, each tied to a documented protocol a third party can repeat.

The staged abuse protocol

True positives are generated by driving individual lithium-ion cells into thermal runaway under controlled abuse — overcharge, mechanical penetration, and external heating — inside a representative vehicle and deck geometry. Each run is fully instrumented: cell temperature, off-gas onset, surface thermal signature, and the detection system's own per-vehicle baseline are logged on a common time base, so the moment of alarm can be placed exactly against the moment of physical onset.

What 'early' is measured against

Lead time is reported relative to the conventional baseline a ship already has: deck-zone smoke detection. Across staged cell-level abuse runs, the per-vehicle thermal anomaly crossed alarm 18–25 minutes before any deck-zone smoke detector tripped. That gap is the deliverable — not an absolute detection time, which depends on cell chemistry and abuse mode, but the margin over the equipment already fitted to the vessel.

18–25 min

lead time over deck-zone smoke detection

~0.04%

residual false-positive rate vs bench-rig catalogue

staged true-positive replays missed at tuned thresholds [VERIFY]

The other half: the false-positive catalogue

A detection rate without a false-positive rate is half a claim. The nuisance side is validated against a catalogue of recorded non-events — solar gain through deck openings, exhaust plumes, ambient swings, vibration — replayed against the trip engine. Against that catalogue the residual false-positive rate sits near 0.04%. Thresholds are accepted only if they hold that nuisance rate without dropping any staged true-positive replay; an axis that improves one at the cost of the other is rejected.

The test that matters is the one where the same tuned parameters pass both sets — true positives and the nuisance catalogue — without per-run hand adjustment. Tuning to each run individually is how detection demos lie.

Reproducibility is the point

Every run is stored as replayable data, so a classification-society witness or an insurer's assessor can re-run the trip engine against the recorded inputs and reproduce the alarm timing themselves. No live fire required to audit the result. This is what separates a validation program from a marketing video: the evidence is a dataset and a documented procedure, not a one-time event nobody can repeat.

What it means for class and insurers

For a classification society, replayable staged-test evidence supports a witnessed test program without staging a full-scale fire for every review. For underwriters, a paired detection-rate and false-positive-rate, both tied to a repeatable protocol, is the difference between a rateable risk feature and an unverifiable claim — and it maps directly onto the fault-to-response interval behind total-loss casualties like Felicity Ace (2022) and Fremantle Highway (2023).

Sources

1. RoRoSafe bench-rig validation — staged Li-ion abuse runs and nuisance catalogue replay (internal, [VERIFY: 18–25 min lead, ~0.04% FP, zero missed replays])
2. Lithium-ion thermal-runaway staging methodology — public-domain abuse-test physics (overcharge, penetration, external heating)
3. Casualty context — Felicity Ace (2022), Fremantle Highway (2023) — commercial.allianz.com, gcaptain.com, maritime-executive.com

Frequently asked

Questions, answered

How is thermal anomaly detection validated?+

Against two staged test sets. True positives come from driving individual lithium-ion cells into thermal runaway under controlled abuse inside representative vehicle and deck geometry, fully instrumented on a common time base. False positives are validated against a catalogue of recorded non-events replayed through the trip engine. A detection rate and a false-positive rate are reported together, each tied to a repeatable protocol.

What lead time does validation show?+

Across staged cell-level abuse runs, the per-vehicle thermal anomaly crossed alarm 18–25 minutes before any deck-zone smoke detector tripped. The figure is reported as a margin over the conventional detection a ship already carries, not as an absolute time — absolute onset depends on cell chemistry and abuse mode, but the lead over fitted equipment is the deliverable.

How are false positives tested?+

A catalogue of recorded non-events — solar gain through deck openings, exhaust plumes, ambient swings, vibration — is replayed against the trip engine. The residual false-positive rate sits near 0.04%. Thresholds are only accepted if they hold that nuisance rate without dropping any staged true-positive replay, so neither side is improved at the other's expense.

Can a third party reproduce the results?+

Yes. Every run is stored as replayable data, so a classification-society witness or an insurer's assessor can re-run the trip engine against the recorded inputs and reproduce the alarm timing without staging a live fire. The evidence is a dataset and a documented procedure rather than a one-time event, which is what makes it auditable.