Adapting Vision Foundation Models for Real-time Ultrasound Image Segmentation

Xiaoran Zhang¹, Eric Z. Chen², Lin Zhao², Xiao Chen², Yikang Liu², Boris Maihe², James Duncan¹ Terrence Chen² Shanhui Sun²

¹Yale University, ²United Imaging Intelligence

MICCAI 2025 (Early accept)

arXiv Poster

Motivation

Ultrasound image segmentation is a long-standing problem in medical imaging due to its low signal-to-noise ratio, indistinct and ambiguous anatomical boundaries, and high anatomical variability across patients.
Existing ultrasound segmentation methods often struggle with adaptability to new tasks, relying on costly manual annotations.
Current real-time segmentation approaches fail to match state-of-the-art performance.

Method

We adapt Hiera to extract multi-scale features, interleaved with DINOv2 features and decoded by a hierarchical decoder. Red blocks denote trainable parameters.

Adapters: We introduce a lightweight adapter positioned after the skip connection in Hiera’s multi-scale attention block built on MViTv2.

Feature interleaving: To enhance semantic representation, we incorporate an auxiliary DINOv2 encoder by applying an interleaving strategy by merging features slice by slice along channel dimensions.

Hierarchical decoder: We propose a hierarchical decoder that progressively fuses coarse-to-fine representations in a UNet-like style.

Results

Data efficiency and adaptability on cardiac ultrasound

Our approach remains highly effective under limited supervision, significantly outperforming baselines when trained with only 1% and 10% of the training data.

Cross-dataset generalization on thyroid ultrasound

Our method demonstrates strong generalization capability when trained on TN3K and tested on DDTI and outperforms existing state-of-the-art methods on other thyroid ultrasound datasets.

Citation

@article{zhang2025adapting,
        title={Adapting Vision Foundation Models for Real-time Ultrasound Image Segmentation},
        author={Zhang, Xiaoran and Chen, Eric Z and Zhao, Lin and Chen, Xiao and Liu, Yikang and Maihe, Boris and Duncan, James S and Chen, Terrence and Sun, Shanhui},
        journal={arXiv preprint arXiv:2503.24368},
        year={2025}
      }