Intelligence & Robotics

Search Log In

Intelligence & Robotics

Figure2

From: Retrieve-then-compare mitigates visual hallucination in multi-modal large language models

Retrieve-then-compare mitigates visual hallucination in multi-modal large language models

Figure 2. Experiment pipeline for investigating the impact of visual and textual input modality on the hallucinatory output. At each decoding step $$ t $$, the test image $$ \boldsymbol{v}^{\tau} $$ is replaced with alternative images $$ \boldsymbol{v}' $$ while keeping the textual prefix constant. Next, we assess the difference in output confidence scores (i.e., logits) between $$ y_t $$ and $$ \hat{y_t} $$ to demonstrate the impact of the visual input. This test image is taken from the OpenImages validation set ^[46]. Similar images are retrieved from the COCO Caption dataset ^[47].

Intelligence & Robotics

ISSN 2770-3541 (Online)

[email protected]

Navigation

Follow Us

Navigation

Committee on Publication Ethics

https://members.publicationethics.org/members/intelligence-robotics

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Committee on Publication Ethics

https://members.publicationethics.org/members/intelligence-robotics

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

[email protected]

Discover Content

Language Editing

Layout & Production

Graphical Abstracts

Video Abstracts

Conference Organizer

Strategic Collaborators

Follow OAE

© 2016-2025 OAE Publishing Inc., except certain content provided by third parties

Privacy Cookies Terms of Service