Thangka Super-Resolution Diffusion Model Based on DCT-Upsample and High-Frequency Focused Attention
Author(s):Xin Chen, Liqi Ji, Zhen Wang, Yunbo Yang, Xinyang Zhang, Nianyi Wang
Published In:Research Square
Cite:Xin Chen, Liqi Ji, Zhen Wang et al. Thangka Super-Resolution Diffusion Model Based on DCT-Upsample and High-Frequency Focused Attention, 30 August 2024, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-4771384/v1]
Thangka is a traditional Tibetan painting art form, possessing profound cultural significance and a unique artistic style. Image super-resolution technology, as an effective means of digital preservation and restoration, plays an important role in maintaining the integrity and heritage of Thangka art. However, existing image super-resolution methods cannot be used for Thangka images due to the following reasons:
- (1) Thangka images are large in size and rich in content, making it difficult for existing models to restore the original textures of degraded Thangka images.
- (2) Thangka images have intricate textures, so the high-resolution Thangka images reconstructed by existing methods, which perform well on objective metrics, perform poorly in terms of human visual perception (HVP).
To overcome these challenges, a Frequency-Domain Enhanced Diffusion Super-Resolution (FDEDiff) is proposed, consisting of three parts:
- (1) a High-Frequency Focused Cross Attention Mechanism (HFC-Attention), which can separate high-frequency features of images to guide the attention mechanism, improving the reconstruction quality of high-frequency details in the diffusion model;
- (2) a DCT Domain Padding Upsample (DCT-Upsample), which performs upsampling in the Discrete Cosine Transform (DCT) domain and improves the reconstruction of dense line areas in Thangka images by fully utilizing global information;
- (3) for the first time, we construct a Thangka image super-resolution dataset, which contains 82,688 pairs of 512 × 512 images. Qualitative and quantitative experiments demonstrate that the proposed method achieves satisfactory high-resolution reconstruction results on Thangka images. Dataset is available at https://github.com/cvlabdatasets/ThangkaDatasets.