I am currently a lecturer at the School of Computer Science and Engineering at Huizhou University. I received the B.Sc. degree in electronic information science and technology from the Sun Yat-Sen University, Guangzhou, China, in 2016 and B.Eng. degree in electronic engineering from the Chinese University of Hong Kong, Hong Kong, China, in 2016, and the M.Eng. degree in electrical engineering and the M.Sc. degree in computer and information technology from the University of Pennsylvania, Philadelphia, USA, in 2019. I am currently a Ph.D. candidate in computer science from the IPPRLab, University of Macau, Macao and the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, co-supervised by Prof. Chi-Man Pun and Prof. Shuqiang Wang.
Film, a classic image style, is culturally significant to the whole photographic industry since it marks the birth of photography. However, film photography is time-consuming and expensive, necessitating a more efficient method for collecting film-style photographs. Numerous datasets that have emerged in the field of image enhancement so far are not film-specific. In order to facilitate film-based image stylization research, we construct FilmSet, a large-scale and high-quality film style dataset. Our dataset includes three different film types and more than 5000 in-the-wild high resolution images. Inspired by the features of FilmSet images, we propose a novel framework called FilmNet based on Laplacian Pyramid for stylizing images across frequency bands and achieving film style outcomes. Experiments reveal that the performance of our model is superior than state-of-the-art techniques. The link of our dataset and code is https://github.com/CXH-Research/FilmNet.
@inproceedings{Li:2023,author={Li, Zinuo and Chen, Xuhang and Wang, Shuqiang and Pun, Chi-Man},booktitle={Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)},pages={1160-1168},title={A Large-Scale Film Style Dataset for Learning Multi-frequency Driven Film Enhancement},year={2023},publisher={International Joint Conferences on Artificial Intelligence Organization},address={Macao, China},doi={10.24963/IJCAI.2023/129},}
Vignetting commonly occurs as a degradation in images resulting from factors such as lens design, improper lens hood usage, and limitations in camera sensors. This degradation affects image details, color accuracy, and presents challenges in computational photography. Existing vignetting removal algorithms predominantly rely on ideal physics assumptions and hand-crafted parameters, resulting in the ineffective removal of irregular vignetting and suboptimal results. Moreover, the substantial lack of real-world vignetting datasets hinders the objective and comprehensive evaluation of vignetting removal. To address these challenges, we present VigSet, a pioneering dataset for vignetting removal. VigSet includes 983 pairs of both vignetting and vignetting-free high-resolution (over 4k) real-world images under various conditions. In addition, We introduce DeVigNet, a novel frequency-aware Transformer architecture designed for vignetting removal. Through the Laplacian Pyramid decomposition, we propose the Dual Aggregated Fusion Transformer to handle global features and remove vignetting in the low-frequency domain. Additionally, we propose the Adaptive Channel Expansion Module to enhance details in the high-frequency domain. The experiments demonstrate that the proposed model outperforms existing state-of-the-art methods. The code, models, and dataset are available at https://github.com/CXH-Research/DeVigNet.
@inproceedings{Luo:2024,title={Devignet: High-Resolution Vignetting Removal via a Dual Aggregated Fusion Transformer with Adaptive Channel Expansion},booktitle={Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)},author={Luo, Shenghong and Chen, Xuhang and Chen, Weiwen and Li, Zinuo and Wang, Shuqiang and Pun, Chi-Man},year={2024},publisher={Association for the Advancement of Artificial Intelligence},address={Vancouver, Canada},pages={4000-4008},doi={10.1609/AAAI.V38I5.28193},}
Shadows often occur when we capture the documents with casual equipment, which influences the visual quality and readability of the digital copies. Different from the algorithms for natural shadow removal, the algorithms in document shadow removal need to preserve the details of fonts and figures in high-resolution input. Previous works ignore this problem and remove the shadows via approximate attention and small datasets, which might not work in real-world situations. We handle high-resolution document shadow removal directly via a larger-scale real-world dataset and a carefully designed frequency-aware network. As for the dataset, we acquire over 7k couples of high-resolution (2462 x 3699) images of real-world document pairs with various samples under different lighting circumstances, which is 10 times larger than existing datasets. As for the design of the network, we decouple the high-resolution images in the frequency domain, where the low-frequency details and high-frequency boundaries can be effectively learned via the carefully designed network structure. Powered by our network and dataset, the proposed method clearly shows a better performance than previous methods in terms of visual quality and numerical results. The code, models, and dataset are available at https://github.com/CXH-Research/DocShadow-SD7K.
@inproceedings{Li:2024,author={Li, Zinuo and Chen, Xuhang and Pun, Chi-Man and Cun, Xiaodong},booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},pages={12449-12458},title={High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net},year={2023},publisher={IEEE},address={Paris, France},doi={10.1109/ICCV51070.2023.01144},}
Specular highlights are a common issue in images captured under direct light sources. They are caused by the reflection of light sources on the surface of objects, which can lead to overexposure and loss of detail. Existing methods for specular highlight removal often rely on hand-crafted features and heuristics, which limits their effectiveness. In this paper, we propose a dual-hybrid attention network for specular highlight removal. The network consists of two branches: a spatial attention branch and a channel attention branch. The spatial attention branch focuses on the spatial distribution of specular highlights, while the channel attention branch emphasizes the importance of different channels. The two branches are combined to form a dual-hybrid attention network, which effectively removes specular highlights while preserving image details. Experimental results show that the proposed network outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics.
@inproceedings{Guo:2024,title={Dual-Hybrid Attention Network for Specular Highlight Removal},author={Guo, Xiaojiao and Chen, Xuhang and Luo, Shenghong and Wang, Shuqiang and Pun, Chi-Man},booktitle={Proceedings of the ACM International Conference on Multimedia (MM)},year={2024},address={Melbourne, VIC, Australia},publisher={ACM},pages={10173--10181},doi={10.1145/3664647.3680745},}
Document images are often degraded by various stains, significantly impacting their readability and hindering downstream applications such as document digitization and analysis. The absence of a comprehensive stained document dataset has limited the effectiveness of existing document enhancement methods in removing stains while preserving fine-grained details. To address this challenge, we construct StainDoc, the first large-scale, high-resolution (2145\times2245) dataset specifically designed for document stain removal. StainDoc comprises over 5,000 pairs of stained and clean document images across multiple scenes. This dataset encompasses a diverse range of stain types, severities, and document backgrounds, facilitating robust training and evaluation of document stain removal algorithms. Furthermore, we propose StainRestorer, a Transformer-based document stain removal approach. StainRestorer employs a memory-augmented Transformer architecture that captures hierarchical stain representations at part, instance, and semantic levels via the DocMemory module. The Stain Removal Transformer (SRTransformer) leverages these feature representations through a dual attention mechanism: an enhanced spatial attention with an expanded receptive field, and a channel attention captures channel-wise feature importance. This combination enables precise stain removal while preserving document content integrity. Extensive experiments demonstrate StainRestorer’s superior performance over state-of-the-art methods on the StainDoc dataset and its variants StainDoc_Mark and StainDoc_Seal, establishing a new benchmark for document stain removal. Our work highlights the potential of memory-augmented Transformers for this task and contributes a valuable dataset to advance future research.
@inproceedings{Li:2025,title={High-Fidelity Document Stain Removal via A Large-Scale Real-World Dataset and A Memory-Augmented Transformer},author={Li, Mingxian and Sun, Hao and Lei, Yingtie and Zhang, Xiaofeng and Dong, Yihang and Zhou, Yilin and Li, Zimeng and Chen, Xuhang},booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},year={2025},publisher={IEEE},address={Tucson, AZ, USA},}