A typed-tuple protocol applied to a CIFAR-10 CNN

Make every filter queryable in plain English.

We package each of the 224 convolutional filters as a typed tuple (E, S, R, D, G), ground it in the network's actual weights and class-conditional activations, and expose it through hybrid retrieval. Ask a question. Get a grounded answer with traceable citations.

SimpleCNN · 224 filters conv1 32 / conv2 64 / conv3 128 10 CIFAR-10 classes 943 indexed templates

Ask a question See the network

manifestation - hybrid retrieve + generate

click any cited filter_id to see its MU

try:

76.3%

sufficiency under matched intervention

vs 6.7% random matched-budget · +69.6 pp

-38.3 pp

necessity drop under ablation

natural top-1 accuracy on retrieved primary-class filters

+17.4 pp

typed primary-class rule gain

over selectivity-only ranking (paper Sec. 5.2)

01 / The CNN we trained

SimpleCNN on CIFAR-10 - the model under the microscope.

3 conv blocks (32 / 64 / 128 filters) followed by a fully-connected classifier. Each filter shown above the diagram is one of the 224 manifestation units we extract and index.

loop Hover a feature map to inspect it · click to open its MU panel.

conv block feature maps shrink + abstract as depth grows ReLU + MaxPool between blocks 84.66% test accuracy on CIFAR-10

From weights to conversation

The full process, click any stage to jump.

Six stages, one continuous pipeline. The final stage points back to the demo above - that's where every grounded answer surfaces.

Manipulation: a peek. The matrix on the left is the source-to-target sufficiency rate (light = within-category overlap, dark = clean cross-category flips). Jump to the full 10x10 heatmap and necessity-vs-k curve →

02 / From the weights

Where exactly the manifestation units come from.

The MUs are not abstractions - they are derived directly from the trained network's weights and class-conditional activations.

A. The actual learned kernels

Each tile below is a single filter, drawn straight from the trained checkpoint. Click any cell in the grid further down to see the MU we extracted from it.

conv132 filters · 3 in

low-level edges, colours, oriented gradients

conv264 filters · 32 in

mid-level shapes, parts, textures

conv3128 filters · 64 in

high-level class-specific concepts

B. Class-conditional activation evidence

For each of the 10 CIFAR-10 classes we record the post-ReLU activations across all 224 filters on the full CIFAR-10 test set (N=10,000 images · 1,000 per class). Click a class to (i) outline its top-1 filters in the 224-filter matrix below (section D) and (ii) swap section C’s layer-progression to that class.

C. Layer progression

viewing default example

An example image as it walks through conv1 → conv2 → conv3. Click a class above to view that class’s progression.

D. The 224 filters as Manifestation Units

Each cell is one filter. Color = class-selectivity. Inner dot = mean activation. Click any cell to open its full M_k = (E, S, R, D, G).

conv1 low-level features

conv2 mid-level features

conv3 high-level class-specific

color 0

0.7 selectivity

dot 0 max mean activation

ring click a class below to outline its top-1 filters

03 / Index

Each MU renders into multiple deterministic templates.

224 filters x typed templates per filter = 943 indexed documents. Templates substitute stored field values directly - every claim traces back to a number in the per-filter JSON.

filter

doc-type

indexed for this filter 1 of 4 doc-types

Sample: top-5 conv3 filters by selectivity. Click a doc-type tab to see how the same filter renders across templates.

04 / Ask

Hybrid retrieval - exact match meets class-conditional semantic search.

Two rails run in parallel: exact match on E for filter-bearing queries, dense semantic search over S / R / G for class-bearing queries. The typed primary-class rule then promotes only filters whose top-1 class equals the target.

04½ / Pipeline — live

Click a class. Watch the framework run end-to-end.

One pick drives all five stages: extract the target → retrieve top-k primary-class filters from the 943-doc index → inject their templates → amplify by α → verify on the same sufficiency heatmap that anchors section 05. The big matrix above and the heatmap below are live targets of this pipeline.

click a class above to start the pipeline

01extractparse target_class from the query

awaiting target…

02retrievetop-8 filters where best_class == target

conv3 / 128 filtersamber ring = top-1 match

03injecttemplated context for each retrieved filter

no templates injected yet

04amplifypost-ReLU multiplication by α

α 50

framework —

random —

Sufficiency = fraction of trials where amplifying the retrieved filters by α flips the prediction toward the target.

05verifycausal-mediation heatmap, source → target

real per-pair sufficiency rates from the paper's manipulation heatmap. Click a class above to highlight that target’s column.

05 / Verify

MIB causal mediation - sufficiency and necessity, matched-budget.

The CNN demo carries the paper's strongest causal-faithfulness claim. Every number below was registered before outcomes were computed.

Per-pair sufficiency rate

10×10 source→target class matrix from the paper's manipulation heatmap. Values are success rates (%) across 540 trials (90 pairs × 6 α levels). Failures cluster within-category (vehicles vs animals): bird → animals fails, automobile → animals succeeds at 100%, etc.

100%vehicle / animal blocks emerge without supervision

Sufficiency: framework vs random

k=8 filters, alpha amplification, n=270 trials.

Gap: +69.6 pp. Typed primary-class rule contributes +17.4 pp on top of selectivity-only ranking.

Necessity scales monotonically with k

Top-k retrieved primary-class filters zeroed; natural top-1 accuracy drop on 100 correctly-classified test images per class (n=1,000 total across 10 classes).

Framework drop at k=8: −38.3 pp · random matched-budget drop: −3.6 pp · gap +34.7 pp. Distributed mediation by populations of filters, with the typed rule ordering them by mediation strength. 9/10 classes hit the gap.

Four-way decomposition of sufficiency

Localises the framework's H2 contribution to the typed primary-class rule operating over a faithfully-stored S.

A vs C3 = +17.4 pp (typed rule). C3 = C4 confirms S-fidelity.

Scope and limitations

Within-category overlap: failures cluster in vehicle and animal blocks (cat <-> dog at ~67%).
Deer outlier on necessity: smaller selective pool (24 filters vs 29-57) gives only +4.7 pp.
k-budget sensitivity: reported numbers use k=8.
Single architecture: SimpleCNN on CIFAR-10. Generalisation to deeper / vision-transformer architectures is future work.