Figure7

Retrieve-then-compare mitigates visual hallucination in multi-modal large language models

Figure 7. RCD demonstrates robust generalization capabilities to counterintuitive images, thereby enhancing the accuracy of MLLMs’ visual understanding. Correct and hallucinatory content are highlighted in green and red, respectively. RCD: Retrieval contrastive decoding.

Intelligence & Robotics
ISSN 2770-3541 (Online)
Follow Us

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/