RST-UNet: Medical Image Segmentation Transformer Effectively Combining Superpixel

Model Architecture

Abstract

Medical image segmentation has advanced with models like UCTransNet, TransUNet, and TransClaw U-Net, which integrate Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). However, these models face limitations due to the locality of convolutions and the computational demands of Transformers. To overcome these challenges, we introduce RST-UNet, an innovative encoder-decoder network that balances effectiveness with computational efficiency. RST-UNet features two groundbreaking innovations: the Compact Representation Block (CRB) and the Compact Dependency Modeling Block (CDMB). The CRB utilizes superpixel pooling to capture long-range dependencies while minimizing parameters and computation time. The CDMB integrates superpixel unpooling with attention mechanisms and Rotary Position Embedding (RoPE) to enhance long-range dependency modeling. This approach emphasizes critical regions and leverages RoPE to capture extensive image dependencies effectively. Our experimental results on publicly available synapse datasets highlight RST-UNet’s exceptional performance, particularly in segmenting small organs such as the gallbladder, right kidney, and pancreas. Remarkably, RST-UNet achieves superior results without pre-training, showcasing its high adaptability for diverse medical image segmentation tasks. This work represents a significant advancement in developing efficient and effective algorithms for medical image analysis.

Publication
International Conference on Neural Information Processing, 2024 [CCF-C]
Xuhang Chen
Xuhang Chen
Lecturer of Huizhou University