Inference Parameters
Temperature: 1.0
Top_p: 0.95
Optimal for creativity and diversity while maintaining logical coherence.
Complete technical specifications, architecture details, and performance benchmarks of the world's first open-source hybrid attention reasoning model.
We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism.
The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1.
Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek R1, M1 consumes 25% of the FLOPs at a generation length of 100K tokens. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively.
Comprehensive evaluation results across multiple categories
Category | Task | MiniMax-M1-80K | MiniMax-M1-40K | DeepSeek-R1 | Claude 4 Opus | OpenAI-o3 |
---|---|---|---|---|---|---|
Mathematics | AIME 2024 | 86.0 | 83.3 | 79.8 | 76.0 | 91.6 |
AIME 2025 | 76.9 | 74.6 | 70.0 | 75.5 | 88.9 | |
MATH-500 | 96.8 | 96.0 | 97.3 | 98.2 | 98.1 | |
Coding | LiveCodeBench | 65.0 | 62.3 | 55.9 | 56.6 | 75.8 |
FullStackBench | 68.3 | 67.6 | 70.1 | 70.3 | 69.3 | |
Reasoning | GPQA Diamond | 70.0 | 69.2 | 71.5 | 79.6 | 83.3 |
ZebraLogic | 86.8 | 80.1 | 78.7 | 95.1 | 95.8 | |
MMLU-Pro | 81.1 | 80.6 | 84.0 | 85.0 | 85.0 | |
SWE-bench Verified | 56.0 | 55.6 | 49.2 | 72.5 | 69.1 | |
Long Context | OpenAI-MRCR (128k) | 73.4 | 76.1 | 35.8 | 48.9 | 56.5 |
OpenAI-MRCR (1M) | 56.2 | 58.6 | -- | -- | -- | |
LongBench-v2 | 61.5 | 61.0 | 58.3 | 55.6 | 58.8 |
Optimal settings for different scenarios
Temperature: 1.0
Top_p: 0.95
Optimal for creativity and diversity while maintaining logical coherence.
System Prompt:
"You are a helpful assistant."
For summarization, translation, Q&A, creative writing.
System Prompt:
"Please reason step by step, and put your final answer within \boxed{}."
For calculation and logical deduction problems.
Advanced capabilities for tool integration
The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format.
For general use and evaluation, we provide online services and development tools.
@misc{minimax2025minimaxm1scalingtesttimecompute,
title={MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention},
author={MiniMax},
year={2025},
eprint={2506.13585},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.13585},
}