Figure2

CMMF-Net: a generative network based on CLIP-guided multi-modal feature fusion for thermal infrared image colorization

Figure 2. The CI module. The text feature is Q, and the image feature or the current layer output feature is K and V for multi-head attention operation, and the corresponding relationship is established. CI: Cross-modal interaction.

Intelligence & Robotics
ISSN 2770-3541 (Online)
Follow Us

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/