Advancing Comic Image Inpainting: A Novel Dual-Stream Fusion Approach with Texture Enhancements

Model Architecture

Abstract

In the process of comic localization, a crucial step is to fill in the pixels obscured due to the removal of dialogue boxes or sound effect text. Comic inpainting is more challenging than natural images. On one hand, its structure and texture are highly abstract, which confuses semantic interpretation and content synthesis. On the other hand, high-frequency information specific to comic images (such as lines and dots) is crucial for visual representation. This paper proposes the Texture-Structure Fusion Network (TSF-Net) with dual-stream encoder, introducing the Dual-stream Space-Gated Fusion (DSSGF) module for effective feature interaction. Additionally, a Multi-scale Histogram Texture Enhancement (MHTE) module is designed to enhance texture information aggregation dynamically. Visual comparisons and quantitative experiments demonstrate the effectiveness of the method, proving its superiority over existing techniques in comic inpainting. The implementation methods and dataset can be obtained from here.

Publication
International Conference on Neural Information Processing, 2024 [CCF-C]
Xuhang Chen
Xuhang Chen
Lecturer of Huizhou University