MiniMax-M1 Model Details

Complete technical specifications, architecture details, and performance benchmarks of the world's first open-source hybrid attention reasoning model.

📖 Research Paper 🤗 HuggingFace

Model Overview

We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism.

The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1.

Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek R1, M1 consumes 25% of the FLOPs at a generation length of 100K tokens. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively.

Detailed Performance Benchmarks

Comprehensive evaluation results across multiple categories

Category	Task	MiniMax-M1-80K	MiniMax-M1-40K	DeepSeek-R1	Claude 4 Opus	OpenAI-o3
Mathematics	AIME 2024	86.0	83.3	79.8	76.0	91.6
	AIME 2025	76.9	74.6	70.0	75.5	88.9
	MATH-500	96.8	96.0	97.3	98.2	98.1
Coding	LiveCodeBench	65.0	62.3	55.9	56.6	75.8
Coding	FullStackBench	68.3	67.6	70.1	70.3	69.3
Reasoning	GPQA Diamond	70.0	69.2	71.5	79.6	83.3
	ZebraLogic	86.8	80.1	78.7	95.1	95.8
	MMLU-Pro	81.1	80.6	84.0	85.0	85.0
	SWE-bench Verified	56.0	55.6	49.2	72.5	69.1
Long Context	OpenAI-MRCR (128k)	73.4	76.1	35.8	48.9	56.5
	OpenAI-MRCR (1M)	56.2	58.6	--	--	--
	LongBench-v2	61.5	61.0	58.3	55.6	58.8

Usage Recommendations

Optimal settings for different scenarios

⚙️

Inference Parameters

Temperature: 1.0

Top_p: 0.95

Optimal for creativity and diversity while maintaining logical coherence.

💬

General Purpose

System Prompt:

"You are a helpful assistant."

For summarization, translation, Q&A, creative writing.

🔢

Mathematical Tasks

System Prompt:

"Please reason step by step, and put your final answer within \boxed{}."

For calculation and logical deduction problems.

Function Calling

Advanced capabilities for tool integration

Function Calling Support

The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format.

Tool Integration Structured Output Agentic Applications

API & Chatbot

For general use and evaluation, we provide online services and development tools.

Online Chat Developer API MCP Server

Chatbot API Platform

Citation

@misc{minimax2025minimaxm1scalingtesttimecompute,
      title={MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention}, 
      author={MiniMax},
      year={2025},
      eprint={2506.13585},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.13585}, 
}