Point of Care

MobileFetalCLIP Selective Repulsive KD for Mobile Fetal Ultrasound Analysis

Numan Saeed1* · Fadillah Adamsyah Maani1 · Mohammad Yaqub1
1Computer Vision Department, Mohamed bin Zayed University of AI (MBZUAI), Abu Dhabi, UAE
* Corresponding author

Bringing expert-level fetal ultrasound AI to handheld devices. 26Γ— smaller, 24Γ— faster, and more accurate than the teacher.

Upcoming iPhone app SonoSight

App Store release coming soon. I am actively building SonoSight and will publish it soon as the mobile companion for on-device fetal ultrasound AI.

Coming Soon
Clinical Focus Biometry validity and brain sub-plane assistance
Deployment Target Real-time inference on handheld and phone-class hardware
Validation Zero-shot gains over the FetalCLIP teacher on fetal benchmarks
MobileFetalCLIP running on iPhone 16 Pro with real-time fetal ultrasound AI
Deployment View

Designed around handheld workflows where latency, memory footprint, and clarity matter as much as benchmark performance.

0%
HC18 Validity
+5.1% over teacher
0
Brain Sub-plane F1
+8.2pp over teacher
0 ms
iPhone 16 Pro Latency
24Γ— faster than teacher
0M
Visual Parameters
26Γ— fewer than teacher

Bringing Fetal Ultrasound AI to the Point of Care

Prenatal Care
Low-Resource Settings
Mobile Deployment
SOTA Results

Fetal ultrasound AI could transform prenatal care in low-resource settings, yet current foundation models exceed 300M visual parameters, precluding deployment on point-of-care devices. Standard knowledge distillation fails under such extreme capacity gaps (~26×), as compact students waste capacity mimicking architectural artifacts of oversized teachers.

We introduce Selective Repulsive Knowledge Distillation, which decomposes contrastive KD into diagonal and off-diagonal components: matched pair alignment is preserved while the off-diagonal weight decays into negative values, repelling the student from the teacher's inter-class confusions and forcing discovery of architecturally native features.

Our 11.4M parameter student surpasses the 304M-parameter FetalCLIP teacher on zero-shot HC18 biometry validity (88.6% vs. 83.5%) and brain sub-plane F1 (0.784 vs. 0.702), while running at 1.6 ms on iPhone 16 Pro, enabling real-time assistive AI on handheld ultrasound devices.

Key Contributions

01

Selective Repulsive KD

A novel architecture-agnostic methodology decomposing contrastive KD into diagonal (matched-pair) and off-diagonal (non-target) components. Repulsion is applied selectively to off-diagonal while preserving matched-pair alignment.

02

MobileFetalCLIP Model

A mobile-scale vision-language model (75M total, 11.4M visual parameters) that surpasses the 427M FetalCLIP teacher on HC18 validity (+5.1pp) and brain sub-plane F1 (+8.2pp), while retaining 97–98% of linear probing performance.

03

Mechanistic Analysis

Comprehensive analysis via embedding geometry, logit distributions, and controlled ablations demonstrating that Selective Repulsive KD produces structured decorrelationβ€”silhouette score +40% over static KD.

Selective Repulsive Knowledge Distillation

Overcoming the 26Γ— capacity gap by learning what the teacher doesn't know

The Problem

Standard KD forces a small student to strictly mimic a massive teacher (304M params). At a 26Γ— capacity gap, the student wastes parameters learning the teacher's architectural artifacts (ViT-specific self-attention confusions) instead of discriminative medical features.

Our Solution

During the Repulsive Phase, the off-diagonal loss weight Ξ²(t) becomes negative. Instead of copying the teacher's mistakes, the student is actively repelled from the teacher's confusion patterns, forcing discovery of native local-texture features.

Phase 1
Attractive Phase
Ξ²(t) > 0: Student absorbs domain knowledge from teacher's similarity structure
Ξ² > 0
β†’
Phase 2
Transition
Ξ²(t) β‰ˆ 0: KD term contributes negligibly; student driven by L_CLIP objective
Ξ² β‰ˆ 0
β†’
Phase 3
Repulsive Phase
Ξ²(t) < 0: Gradient inverts. Student learns to separate classes differently from the teacher
Ξ² < 0

Method Overview: Selective Repulsive Knowledge Distillation

MobileFetalCLIP method overview showing paired ultrasound inputs, frozen teacher and trainable student encoders, teacher and student similarity matrices, selective repulsive knowledge distillation, and the attractive-to-repulsive phase schedule

Overview of the training signal used in MobileFetalCLIP. The frozen FetalCLIP teacher and trainable FastViT student produce teacher and student similarity matrices, which are compared through a diagonal-protected decomposition: matched pairs remain fixed while the off-diagonal term is scheduled from attractive to repulsive to encourage architecturally native fetal ultrasound representations.

Surpassing the Teacher at 26Γ— Fewer Parameters

Zero-shot evaluation on fetal ultrasound benchmarks

HC18 Biometry Validity (%) β€” Zero-Shot

Teacher (83.5%)
MobileFetalCLIP (Ours) 75M Β· FastViT
88.6%
FetalCLIP Teacher 427M Β· ViT-L/14
83.5%
Static KD Baseline 75M
79.4%
BiomedCLIP 150M Β· ViT-B/16
24.0%
CLIP 427M Β· ViT-L/14
11.0%

Full Zero-Shot Comparison on Fetal Ultrasound Benchmarks

Full Zero-Shot Comparison on Fetal Ultrasound Benchmarks
Model Params HC18 (%) F1-5Plane F1-3Brain F1-all
Teacher
Teacher FetalCLIP (ViT-L/14) 427M 83.5 0.973 0.702 0.871
General VLMs (not fetal-specific)
CLIP (ViT-L/14) 427M 11.0 0.308 0.206 0.270
BiomedCLIP (ViT-B/16) 150M 24.0 0.603 0.236 0.466
UniMed-CLIP (ViT-B/16) 150M 9.0 0.679 0.187 0.495
MobileFetalCLIP variants (FastViT, 75M total)
No KD (CLIP only) 75M 71.3 0.889 0.712 0.823
Static Logit KD (CLIP-KD baseline) 75M 79.4 0.946 0.715 0.859
Coupled Repulsive KD (r=βˆ’0.8) 75M 84.4 0.933 0.763 0.869
Ours Selective Repulsive KD (Ξ²β‚€=2, r=βˆ’0.8) 75M 88.6 0.946 0.784 0.886

Feature Space Analysis: t-SNE Projections of Brain Sub-plane Embeddings

t-SNE visualizations comparing No KD, Static KD, and Selective Repulsive KD brain sub-plane cluster separation

(a) No KD: Overlapping transthalamic/transventricular clusters. (b) Static KD: Marginal improvement. (c) Selective Repulsive KD: Well-separated, compact clusters consistent with the +8.2pp F1-3Brain gain over the teacher (silhouette score +40% over static KD).

Embedding Geometry on Planes DB (5-plane, 8,187 images)

Ablation Study Results for FetalCLIP Knowledge Distillation
Method d_eff ↑ Silhouette ↑ Intra ↑ Inter ↓ Uniformity ↓
Static KD (Ξ»=1.0) 8.0 0.375 0.712 0.445 βˆ’1.662
Confidence Penalty 9.0 0.406 0.693 0.389 βˆ’1.811
Coupled r=βˆ’0.8 6.4 0.509 0.645 0.010 βˆ’2.231
Selective Ξ²β‚€=2 (Ours) 10.0 0.525 0.623 0.076 βˆ’2.308

Real-Time AI at the Point of Care

32Γ— fewer GMACs Β· 26Γ— fewer parameters Β· 24Γ— lower latency on iPhone 16 Pro

iPhone 16 Pro
1.6 ms
MobileFetalCLIP
vs
37.6 ms
FetalCLIP Teacher
24Γ— speedup
MobileFetalCLIP 1.6ms
FetalCLIP 37.6ms
>600 FPS β€” exceeds 30–60 fps diagnostic ultrasound by 10–20Γ—
iPhone 14
3.8 ms
MobileFetalCLIP
vs
OUT OF MEMORY
FetalCLIP Teacher
Teacher runs Out-of-Memory; MobileFetalCLIP runs seamlessly

Inference Efficiency Comparison

Mobile Device Efficiency and Speed Metrics
Model Params GMACs iPhone 14 iPhone 16 Pro
FetalCLIP (Teacher) 304M 49.4G OOM 37.6 ms
Static KD (Baseline) 11.4M 1.5G 3.8 ms 1.6 ms
MobileFetalCLIP (Ours) 11.4M 1.5G 3.8 ms 1.6 ms
32Γ— fewer GMACs
26Γ— fewer params
24Γ— faster on device

Linear Probing: Frozen Feature Quality

MobileFetalCLIP retains 97–98% of the FetalCLIP teacher's linear probing performance at 26Γ— fewer visual parameters β€” frozen encoder + single linear layer, 5-fold Γ— 5 seeds, 95% CI.

Linear Probing: Frozen Feature Quality
Model 6-View F1 Brain F1 CHD AUROC
CLIP (ViT-L/14) .867 .634 .679
BiomedCLIP (ViT-B/16) .856 .582 .643
UniMed-CLIP (ViT-B/16) .860 .607 .718
FetalCLIP (ViT-L/14) .947 .820 .787
MobileFetalCLIP (FastViT) .930 98.2% .799 97.4% .769 97.7%

Cite This Work

BibTeX
@article{saeed2026mobilefetalclip,
  title     = {MobileFetalCLIP: Selective Repulsive Knowledge Distillation
               for Mobile Fetal Ultrasound Analysis},
  author    = {Saeed, Numan and Maani, Fadillah Adamsyah and Yaqub, Mohammad},
  journal   = {arXiv preprint arXiv:2603.05421},
  year      = {2026}
}