MSNav: Zero-Shot Vision-and-Language Navigation with Dynamic Memory and Feature Enhancement

Published in Under Review, 2025

Traditional intelligent agents can only passively receive all information and cannot actively filter relevant information for tasks to form task memory. We propose MSNav, a modular framework with dynamic topological memory and spatial capabilities that addresses this limitation in zero-shot vision-and-language navigation tasks.

Key Contributions:

  • Proposed MSNav, a modular framework with dynamic topological memory and spatial capabilities
  • Addressed the limitation of traditional agents in active information filtering
  • Achieved significant improvements in zero-shot navigation performance

Personal Contribution: Co-first author, responsible for all experimental implementation and part of paper writing.

Download paper here

Recommended citation: Chenghao Liu*, Zhimu Zhou*, Jiachen Zhang, Minghao Zhang, Songfang Huang, Huiling Duan. (2025). "MSNav: Zero-Shot Vision-and-Language Navigation with Dynamic Memory and Feature Enhancement." Under Review. (*Co-first Author, randomly ordered by dice rolling)
Download Paper