VAST Data, the AI Operating System company, has announced a breakthrough inference architecture that powers the NVIDIA Inference Context Memory Storage Platform, marking a new era for long-lived, agentic AI. The platform introduces a class of AI-native storage infrastructure designed for gigascale inference, built on NVIDIA BlueField-4 DPUs and Spectrum-X Ethernet networking. It accelerates access to AI-native key-value KV caches, enables high-speed context sharing across nodes, and significantly improves power efficiency.
As AI inference evolves from single-prompt tasks to persistent, multi-turn reasoning across agents, the assumption that context remains local is no longer valid. Performance now hinges on how efficiently inference history can be stored, restored, reused, extended, and shared under sustained load, rather than simply on raw GPU compute power.
To address this, VAST is running its AI Operating System AI OS software natively on NVIDIA BlueField-4 DPUs. This embeds critical data services directly into GPU servers where inference occurs, as well as in dedicated data nodes. The architecture eliminates classic client-server contention and unnecessary data copies, reducing time-to-first-token TTFT even under high concurrency. Combined with VAST's parallel Disaggregated Shared-Everything DASE design, each host can access a shared, globally coherent context namespace without bottlenecks, enabling seamless access from GPU memory to persistent NVMe storage over RDMA fabrics.