We're introducing MatMamba, an innovative extension of the Mamba2 architecture with Matryoshka-style training and adaptive inference. This framework allows the training of a single elastic model, capable of generating hundreds of nested submodels — efficiently and without the need for retraining.
MatMamba’s core innovation: optimize multiple nested models of varying sizes within the same weight space.
By slicing the model along hidden dimensions (e.g., 2048, 1024, 512, 256), MatMamba jointly optimizes four nested models. Because the smallest dimensions are shared between the models, they capture the most salient representations. We can extract hundreds of submodels depending upon the deployment constraints (latency, compute, power etc) — making MatMamba a versatile model for adaptive inference.
Like Mamba2 and Transformers, MatMamba serves as a general-purpose sequence processing architecture. Its ability to process arbitrary inputs and outputs that are (batch, sequence length, dimension) shaped tensors makes it suitable for a broad range of tasks. We demonstrate MatMamba’s versatility with both language (MatMamba-LM) and vision (MatMamba-Vision) models, showing that it can adapt to diverse applications.
MatMamba-LM
We trained MatMamba-LM models spanning various sizes, from 130 million to 1.4 billion parameters. These models scale as efficiently as Mamba2 baselines, with an added advantage of flexible smaller, accurate submodels. During inference, these submodels are dynamically selected based on available compute resources.
MatMamba-Vision
On the vision front, MatMamba-Vision models— base models 35 million and 135 million parameters—excel on ImageNet. Notably, they offer free submodels along the accuracy-compute tradeoff curve that were extracted by creating a Mix'N'Matched combination of nested dimensions per layer, allowing users to flexibly adjust performance based on their requirements without additional training.
MatMamba’s nested flexibility is particularly promising for long-form and high-resolution visual tasks, because of the efficiency characteristics of Mamba2 and the adaptivity of Matryoshka representations.
Mamba2 models are known for their speed, especially at longer context lengths. MatMamba retains this speed while introducing new adaptive capabilities. This is especially beneficial for applications such as adaptive image retrieval, where the largest model can be used for embedding datasets, while smaller submodels can be leveraged for quick and efficient queries.
Envisioning more effective, real-time AI systems that can adapt
We're excited about the potential impact of MatMamba’s adaptive efficiency on fields like robotics. With the ability to adjust models dynamically based on available compute, we envision more effective, real-time AI systems that can adapt to varying computational constraints. With MatMamba, we aim to merge the best of both worlds—the adaptability of Matryoshka-style learning and the speed of state-space models like Mamba2. We hope this work will drive further innovations into efficient, adaptive AI systems across industries.
Listen to this AI-generated NotebookLM podcast about MatMamba:
For more details, check out our paper and code.