MSNav: Zero-Shot Vision-and-Language Navigation

MSNav is a breakthrough framework that addresses the limitations of traditional intelligent agents in zero-shot vision-and-language navigation. The system features dynamic topological memory and spatial capabilities, enabling agents to actively filter relevant information for tasks rather than passively receiving all information.

Key Features:

  • Dynamic topological memory system
  • Spatial reasoning capabilities
  • Modular framework design
  • Zero-shot navigation performance

Technologies: Python, PyTorch, Computer Vision, Natural Language Processing

Status: Under Review, Preprint Available