Make every filter queryable in plain English.
We package each of the 224 convolutional filters as a typed tuple (E, S, R, D, G), ground it in the network's actual weights and class-conditional activations, and expose it through hybrid retrieval. Ask a question. Get a grounded answer with traceable citations.
SimpleCNN on CIFAR-10 - the model under the microscope.
3 conv blocks (32 / 64 / 128 filters) followed by a fully-connected classifier. Each filter shown above the diagram is one of the 224 manifestation units we extract and index.
The full process, click any stage to jump.
Six stages, one continuous pipeline. The final stage points back to the demo above - that's where every grounded answer surfaces.
Where exactly the manifestation units come from.
The MUs are not abstractions - they are derived directly from the trained network's weights and class-conditional activations.
A. The actual learned kernels
Each tile below is a single filter, drawn straight from the trained checkpoint. Click any cell in the grid further down to see the MU we extracted from it.



B. Class-conditional activation evidence
For each of the 10 CIFAR-10 classes we record the post-ReLU activations across all 224 filters on the full CIFAR-10 test set (N=10,000 images · 1,000 per class). Click a class to (i) outline its top-1 filters in the 224-filter matrix below (section D) and (ii) swap section C’s layer-progression to that class.
C. Layer progression
viewing default exampleAn example image as it walks through conv1 → conv2 → conv3. Click a class above to view that class’s progression.
conv1 low-level features
conv2 mid-level features
conv3 high-level class-specific
Each MU renders into multiple deterministic templates.
224 filters x typed templates per filter = 943 indexed documents. Templates substitute stored field values directly - every claim traces back to a number in the per-filter JSON.
Sample: top-5 conv3 filters by selectivity. Click a doc-type tab to see how the same filter renders across templates.
Hybrid retrieval - exact match meets class-conditional semantic search.
Two rails run in parallel: exact match on E for filter-bearing queries, dense semantic search over S / R / G for class-bearing queries. The typed primary-class rule then promotes only filters whose top-1 class equals the target.
Click a class. Watch the framework run end-to-end.
One pick drives all five stages: extract the target → retrieve top-k primary-class filters from the 943-doc index → inject their templates → amplify by α → verify on the same sufficiency heatmap that anchors section 05. The big matrix above and the heatmap below are live targets of this pipeline.
Sufficiency = fraction of trials where amplifying the retrieved filters by α flips the prediction toward the target.
MIB causal mediation - sufficiency and necessity, matched-budget.
The CNN demo carries the paper's strongest causal-faithfulness claim. Every number below was registered before outcomes were computed.
Per-pair sufficiency rate
10×10 source→target class matrix from the paper's manipulation heatmap. Values are success rates (%) across 540 trials (90 pairs × 6 α levels). Failures cluster within-category (vehicles vs animals): bird → animals fails, automobile → animals succeeds at 100%, etc.
Sufficiency: framework vs random
k=8 filters, alpha amplification, n=270 trials.
Gap: +69.6 pp. Typed primary-class rule contributes +17.4 pp on top of selectivity-only ranking.
Necessity scales monotonically with k
Top-k retrieved primary-class filters zeroed; natural top-1 accuracy drop on 100 correctly-classified test images per class (n=1,000 total across 10 classes).
Framework drop at k=8: −38.3 pp · random matched-budget drop: −3.6 pp · gap +34.7 pp. Distributed mediation by populations of filters, with the typed rule ordering them by mediation strength. 9/10 classes hit the gap.
Four-way decomposition of sufficiency
Localises the framework's H2 contribution to the typed primary-class rule operating over a faithfully-stored S.
A vs C3 = +17.4 pp (typed rule). C3 = C4 confirms S-fidelity.
Scope and limitations
- Within-category overlap: failures cluster in vehicle and animal blocks (cat <-> dog at ~67%).
- Deer outlier on necessity: smaller selective pool (24 filters vs 29-57) gives only +4.7 pp.
- k-budget sensitivity: reported numbers use k=8.
- Single architecture: SimpleCNN on CIFAR-10. Generalisation to deeper / vision-transformer architectures is future work.