Progressive Test Time Energy Adaptation for Medical Image Segmentation

Xiaoran Zhang¹, Byung-Woo Hong², Hyoungseob Park¹, Daniel H. Pak¹, Anne-Marie Rickmann¹, Lawrence H. Staib¹, James S. Duncan^1* Alex Wong^1*

¹Yale University, ²Chung-Ang University ^*Joint supervision

ICCV 2025 (Highlight)

Preprint Poster Code

Motivation

Covariate shifts caused by nuisances such as heteroscedastic noise and inconsistent imaging protocols limit the fidelity of medical image segmentation models.

Without assuming access to a pre-collected target dataset, which is often impractical, test-time adaptation (TTA) offers a practical solution to calibrate models on-the-fly during inference.

Assuming a segmentation model is solely trained on source dataset, our goal is to adapt the model to target data without access to the entire target dataset.

Key components

Region-based Shape Energy Model

Why use energy-based model?

Energy-based models naturally capture distribution changes by reflecting sample likelihood, making them suited for TTA to quantify distribution misalignments.
We propose a shape energy model trained on source data, which assigns an energy score at the region level:
- - Low energy → ID (accurate) shapes
  - High energy → OOD (erroneous) predictions

Curate Negative Examples for Energy Model Training

Impossible to obtain real negative examples (erroneous predictions at inference)?

Since the input distribution to the energy model is constrained by the predictions afforded by the segmentation model, we propose to explore data space by probing the segmentation model with inputs optimized to simulate OOD examples.

Method

We assume a segmentation model \( f_\theta(\cdot) \) is pretrained on a source dataset. (a) The energy model \( g_\phi(\cdot) \) is trained to estimate patchwise energy values, using binary reference energy labels based on the mismatch between perturbed predictions \( \hat{Y}_s \) and ground truth shape \( Y_s \) on the source dataset. (b) During adaptation, trained energy model \( g_\phi(\cdot) \) is applied to predictions on test-time distribution, and the BatchNorm layers of \( f_\theta(\cdot) \) are updated iteratively to match with uniform low energy as target.

Perturbation curation: We generate negative (implausible) examples by applying FGSM adversarial noise and spatial affine transformations to the input images.

Label curation: For each perturbed segmentation, we compare it with ground truth and assign categorical energy labels to each region, where regions dissimilar to the ground truth are labeled as high-energy.

Shape energy model training: A region-based model learns patchwise energy values, assigning high energy to implausible regions and low energy to anatomically valid ones.

Progressive test-time adaptation: At inference, the segmentation model is iteratively updated to minimize the predicted energy, aligning outputs with plausible anatomical shapes.

Results

Progressive update visualization

Our adversarial perturbation strategy produces images and segmentations that align with OOD cases, validating its effectiveness in modeling real covariate shifts.

Progressive update visualization

Our method progressively refines segmentation quality over iterations (left), while achieving better convergence under the same time budget (right).

Citation

@article{zhang2025progressive,
        title={Progressive Test Time Energy Adaptation for Medical Image Segmentation},
        author={Zhang, Xiaoran and Hong, Byung-Woo and Park, Hyoungseob and Pak, Daniel H and Rickmann, Anne-Marie and Staib, Lawrence H and Duncan, James S and Wong, Alex},
        journal={arXiv preprint arXiv:2503.16616},
        year={2025}
      }