Figure10

Retrieve-then-compare mitigates visual hallucination in multi-modal large language models

Figure 10. Qualitative results on our proposed image captioning benchmark. Our method RCD improves the accuracy of image descriptions and reduces various types of visual hallucinations. Additionally, RCD enhances the specificity of the descriptions. Incorrect content is highlighted in red, while correct content is highlighted in green. RCD: Retrieval contrastive decoding.

Intelligence & Robotics
ISSN 2770-3541 (Online)
Follow Us

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/