ROSAL: Semi-supervised Active Learning with Representation Aggregation and Outlier for Endoscopy Image Classification

Model Architecture

Abstract

The classification of endoscopy images is vital for early detection and prevention of Colorectal Cancer (CRC). However, manual annotation of these images is expensive. Semi-supervised Active Learning (SAL) can help reduce costs, but issues with the accuracy of pseudo-labels and the tendency to over-select outliers remain. To address these, we introduce ROSAL, a new SAL framework featuring Representational Correlation-based Pseudo-label Training (RCPT) and Outlier-based Hybrid Querying (OHQ). RCPT employs a pseudo-label contrastive loss to enhance agreement among unlabeled data representations and reduce discord. The pseudo-label generator in RCPT leverages this correlation for more precise labeling. OHQ introduces a distance factor to minimize outlier selection through a hybrid querying strategy. Experimental results demonstrate that ROSAL outperforms other active learning methods, achieving 71.46% and 90.79% accuracy on a publicly available endoscopic dataset and a publicly available natural image dataset, respectively, using only 40% and 20% of the labeled data.

Publication
International Conference on Neural Information Processing, 2024 [CCF-C]
Xuhang Chen
Xuhang Chen
Lecturer of Huizhou University