[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$fnLNZXGU1hje-_5mK6VN9USzrFRTGzgDtzzdkzcVZWPs":3},{"code":4,"msg":5,"data":6},200,"操作成功",{"id":7,"title":8,"content":9,"digest":10,"source":10,"coverPath":11,"thumbsCoverPath":12,"isTop":13,"isShow":14,"baseClick":13,"clickCount":15,"createTime":16,"typeId":17,"isNewest":18,"newsInfoTypeRespVo":19,"voiceUrl":22,"voiceSize":23,"taskId":24,"releaseTime":25,"titleEn":26,"contentEn":27,"voiceUrlEn":28,"taskIdEn":29,"voiceSizeEn":30},1340,"​微软推出新型 AI Agent 模型 rStar2-Agent，以 140 亿参数挑战大规模模型","\u003Cp>\u003Cstrong style=\"font-size: 18px; color: rgb(255, 153, 0);\" class=\"ql-lineHeight-1-75\">微软最近在 AI 领域取得了显著突破，开源了一款名为 rStar2-Agent 的 AI Agent 推理模型。这款模型采用了创新的智能体强化学习方法，令人惊讶的是，尽管其参数仅有140亿，但在 AIME24数学推理测试中，准确率高达80.6%，成功超越了拥有6710亿参数的 DeepSeek-R1（79.8%）。这样的表现让人们重新思考模型的参数规模与性能之间的关系。\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cimg alt=\"undefined\" src=\"https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F09\u002F977a44fca6dd44ca9fbbcd2e34a06ad7\u002F6389291842136459577091583.png\" width=\"undefined\" height=\"undefined\" style=\"display: block; margin: auto;\">\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">除了数学推理任务的优秀成绩，rStar2-Agent 在其他领域的表现同样引人注目。在 GPQA-Diamond 科学推理基准测试中，该模型的准确率为60.9%，超越了 DeepSeek-V3的59.1%;在 BFCL v3智能体工具使用任务中，其任务完成率达到60.8%，同样高于 DeepSeek-V3的57.6%。这些数据表明，rStar2-Agent 在各类任务中展现出了强大的泛化能力。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">为了实现这一突破，微软在训练基础设施、算法和训练流程上进行了三大创新。首先，在基础设施方面，微软构建了一个高效的隔离式代码执行服务，能够快速处理大量的训练请求，支持每训练步骤高达4.5万次的并发工具调用，平均延迟仅为0.3秒。其次，微软提出了新的 GRPO-RoC 算法，通过有效的奖励机制和算法优化，使得模型在推理过程中更加准确和高效。最后，rStar2-Agent 设计了 “非推理微调 + 多阶段强化学习” 的高效训练流程，以确保模型在各个阶段都能稳步提升能力。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这一系列的技术突破使得 rStar2-Agent 在 AI Agent 领域崭露头角，也为未来的智能体研究和应用开辟了新的方向。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"color: rgb(187, 187, 187);\" class=\"ql-lineHeight-1-75\">【新闻来源】AIbase基地 \u003C\u002Fspan>\u003Ca href=\"https:\u002F\u002Fwww.aibase.com\u002Fzh\u002Fnews\u002F21096\" rel=\"noopener noreferrer\" target=\"_blank\" style=\"color: rgb(187, 187, 187);\" class=\"ql-lineHeight-1-75\">https:\u002F\u002Fwww.aibase.com\u002Fzh\u002Fnews\u002F21096\u003C\u002Fa>\u003C\u002Fp>\u003Cp class=\"ql-align-justify\">\u003Cspan style=\"color: rgb(187, 187, 187);\" class=\"ql-lineHeight-1-75\">（本网转发此文章，旨在为读者提供更多的信息资讯，所涉内容不构成投资、消费建议。文章事实如有疑问，请与有关方核实，文章观点非本网观点，仅供读者参考。）\u003C\u002Fspan>\u003C\u002Fp>","","https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F09\u002F3846a1a86bff4e4888634174df1af543\u002FAI领域.jpg","https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F09\u002Fthumbs\u002F3846a1a86bff4e4888634174df1af543\u002FAI领域.jpg",0,1,40,"2025-09-09 16:19",2,false,{"id":17,"name":20,"enName":21},"芯位视野","Xinwei Vision","https:\u002F\u002Fxinwei-dev-test.oss-cn-shenzhen.aliyuncs.com\u002Fintelligent\u002Faudio%3Ab7fed058-316a-4d29-b783-67c57c621f22%3A0.wav?Expires=1757410040&OSSAccessKeyId=LTAI5tNvY2RkKjZw4LLWsrPK&Signature=1zmvnhDuNKyn0g2A5fmFdSJF7OQ%3D",3699764,"b7fed058-316a-4d29-b783-67c57c621f22","2025-09-09 16:13","​Microsoft has launched a new AI Agent model, rStar2-Agent, with 14 billion parameters to challenge the large-scale model","\u003Cp>\u003Cstrong style=\"font-size: 18px; color: rgb(255, 153, 0);\" class=\"ql-lineHeight-1-75\">Microsoft has made significant breakthroughs in the AI field, open-sourcing an AI Agent inference model called rStar2-Agent. This model adopts an innovative agent reinforcement learning method. Surprisingly, despite having only 14 billion parameters, it achieved an accuracy of 80.6% in the AIME24 mathematical reasoning test, successfully surpassing DeepSeek-R1 with 671 billion parameters (79.8%). Such performance has led people to re-examine the relationship between model parameter size and performance.\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cimg alt=\"undefined\" src=\"https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F09\u002F977a44fca6dd44ca9fbbcd2e34a06ad7\u002F6389291842136459577091583.png\" width=\"undefined\" height=\"undefined\" style=\"display: block; margin: auto;\">\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">In addition to its excellent performance in mathematical reasoning tasks, rStar2-Agent also shows remarkable results in other fields. In the GPQA-Diamond science reasoning benchmark test, the model's accuracy was 60.9%, surpassing DeepSeek-V3's 59.1%; in the BFCL v3 agent tool usage task, its task completion rate reached 60.8%, which is also higher than DeepSeek-V3's 57.6%. These data indicate that rStar2-Agent demonstrates strong generalization ability in various tasks.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">To achieve this breakthrough, Microsoft made three innovations in training infrastructure, algorithms, and training processes. First, in terms of infrastructure, Microsoft built an efficient isolated code execution service that can quickly process a large number of training requests, supporting up to 45,000 concurrent tool calls per training step, with an average latency of only 0.3 seconds. Second, Microsoft proposed a new GRPO-RoC algorithm, which makes the model more accurate and efficient in reasoning through effective reward mechanisms and algorithm optimization. Finally, rStar2-Agent designed an efficient training process of \"non-inference fine-tuning + multi-stage reinforcement learning\" to ensure the model steadily improves its capabilities at each stage.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">This series of technological breakthroughs has made rStar2-Agent stand out in the AI Agent field and opened up new directions for future intelligent agent research and applications.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"color: rgb(187, 187, 187);\" class=\"ql-lineHeight-1-75\">【News Source】AIbase Base \u003C\u002Fspan>\u003Ca href=\"https:\u002F\u002Fwww.aibase.com\u002Fzh\u002Fnews\u002F21096\" rel=\"noopener noreferrer\" target=\"_blank\" style=\"color: rgb(187, 187, 187);\" class=\"ql-lineHeight-1-75\">https:\u002F\u002Fwww.aibase.com\u002Fzh\u002Fnews\u002F21096\u003C\u002Fa>\u003C\u002Fp>\u003Cp class=\"ql-align-justify\">\u003Cspan style=\"color: rgb(187, 187, 187);\" class=\"ql-lineHeight-1-75\">（This article is forwarded by this website to provide readers with more information. The content does not constitute investment or consumption advice. If there are any questions about the facts of the article, please verify with the relevant parties. The views expressed in the article are not the views of this website and are for reference only.）\u003C\u002Fspan>\u003C\u002Fp>","https:\u002F\u002Fxinwei-dev-test.oss-cn-shenzhen.aliyuncs.com\u002Fintelligent\u002Faudio%3A05c6de62-23f5-45da-af27-4cdf02ab39b1%3A0.wav?Expires=1774838474&OSSAccessKeyId=LTAI5tNvY2RkKjZw4LLWsrPK&Signature=pL9%2B1hrCGLKWQYrMtvE0Kzp3s9M%3D","05c6de62-23f5-45da-af27-4cdf02ab39b1",4544102]