[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$f2IcEY3LeXxc6_676Qzgfp9SNIh0FGyuSr72SE3x5NJg":3},{"code":4,"msg":5,"data":6},200,"操作成功",{"id":7,"title":8,"content":9,"digest":10,"source":10,"coverPath":11,"thumbsCoverPath":12,"isTop":13,"isShow":14,"baseClick":13,"clickCount":15,"createTime":16,"typeId":17,"isNewest":18,"newsInfoTypeRespVo":19,"voiceUrl":22,"voiceSize":23,"taskId":24,"releaseTime":25,"titleEn":26,"contentEn":27,"voiceUrlEn":28,"taskIdEn":29,"voiceSizeEn":30},1218,"Salesforce与南加州大学推出CoAct-1:用代码+GUI混合方法，将AI代理自动化推向新高度","\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">Salesforce与南加州大学的研究人员共同开发了一项名为&nbsp;\tCoAct-1&nbsp;的突破性技术，旨在通过结合编码和图形用户界面（GUI）操作的优势，显著提升AI代理在计算机上执行复杂任务的能力。这一混合方法旨在克服传统GUI代理的脆弱性，为更强大、可扩展的自动化铺平道路。\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong style=\"font-size: 18px; color: rgb(255, 153, 0);\" class=\"ql-lineHeight-1-75\">传统AI代理的痛点:长任务与误点击\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">现有的计算机AI代理通常依赖视觉语言模型（VLM）来感知屏幕并模拟鼠标键盘操作。虽然这类“点击式”代理能执行各种任务，但在面对办公生产力套件等具有密集菜单和复杂工作流程的应用时，它们往往表现不佳。研究人员指出，在这些场景中，单一的误点击或对UI元素的误解，都可能导致整个任务失败。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">为了应对这一挑战，研究人员曾尝试利用高级规划器来增强GUI代理，但这种方法依然无法解决那些通过几行代码就能更直接、更可靠地完成的操作。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cimg alt=\"undefined\" src=\"https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002Fa6f11b4796284524bc2dc4204dfd04ba\u002F6389067902437090791722001.png\" width=\"undefined\" height=\"undefined\" style=\"display: block; margin: auto;\">\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong style=\"font-size: 18px; color: rgb(255, 153, 0);\" class=\"ql-lineHeight-1-75\">CoAct-1:一个多智能体协作的混合系统\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">为解决这些限制，CoAct-1系统应运而生。其核心理念是“将GUI操作的直观优势与通过代码直接进行系统交互的精确性、可靠性和效率相结合”。该系统由一个由三个专门代理组成的团队协作完成任务:\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">编排器（Orchestrator）\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">:作为中央规划器，它负责将用户的总体目标分解为子任务，并分配给最合适的代理。\u003C\u002Fspan>\u003C\u002Fli>\u003C\u002Ful>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">程序员（Programmer）\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">:负责编写和执行Python或Bash脚本，处理文件管理或数据处理等后端操作。\u003C\u002Fspan>\u003C\u002Fli>\u003C\u002Ful>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">GUI 操作员（GUI Operator）\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">:基于VLM，专门处理需要点击按钮或导航界面的前端任务。\u003C\u002Fspan>\u003C\u002Fli>\u003C\u002Ful>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这种动态委托机制使得CoAct-1能够策略性地绕过低效的GUI操作，转而采用更稳健、更高效的代码执行，同时保留视觉交互的必要性。整个工作流程是迭代的，每个代理完成子任务后都会向编排器汇报，由其决定下一步行动。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cimg alt=\"undefined\" src=\"https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002Fa2567742e4d34229afdd30a572aa7c0a\u002F6389067905177100076417100.png\" width=\"undefined\" height=\"undefined\" style=\"display: block; margin: auto;\">\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong style=\"font-size: 18px; color: rgb(255, 153, 0);\" class=\"ql-lineHeight-1-75\">性能飞跃:更快、更高效\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">研究人员在&nbsp;\u003C\u002Fspan>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">OSWorld\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">&nbsp;基准测试上对CoAct-1进行了测试，该基准包含了369个跨浏览器、IDE和办公应用程序的实际任务。结果显示，CoAct-1取得了&nbsp;\u003C\u002Fspan>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">60.76%的成功率\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">，树立了新的最高水平。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">尤其是在操作系统级任务和多应用程序工作流中，CoAct-1的性能提升最为显著。更重要的是，该系统的效率也大幅提高，平均只需&nbsp;\u003C\u002Fspan>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">10.15步\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">&nbsp;即可完成任务，远少于其他领先的纯GUI代理所需的15.22步。研究人员指出，更少的步骤不仅能加快任务完成速度，还能最大限度地减少出错的机会，从而实现更高效、更可靠的自动化。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong style=\"font-size: 18px; color: rgb(255, 153, 0);\" class=\"ql-lineHeight-1-75\">从实验室走向企业:潜在的应用与挑战\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这项技术拥有巨大的企业应用潜力。Salesforce应用AI研究总监&nbsp;\u003C\u002Fspan>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Ran Xu\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">&nbsp;指出，客户支持、销售勘探、自动化簿记和营销活动管理等领域都是完美的用例。在这些场景中，企业需要处理有API和无API的多种工具，而CoAct-1能够灵活利用代码和屏幕，提供全面的自动化解决方案。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">然而，将CoAct-1从实验室推向企业环境也面临挑战，包括应对遗留软件、确保安全性和人工监督的必要性。徐强调，需要通过在沙盒环境中训练来提高代理的适应性，并建立强大的访问控制和安全护栏，以防止恶意代码执行。最终，在可预见的未来，\u003C\u002Fspan>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">“人在环”（human-in-the-loop）\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">&nbsp;的模式将是确保代理安全、可靠运行的关键。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"color: rgb(187, 187, 187);\">【新闻来源】AIbase基地 \u003C\u002Fspan>\u003Ca href=\"https:\u002F\u002Fwww.aibase.com\u002Fzh\u002Fnews\u002F20470\" rel=\"noopener noreferrer\" target=\"_blank\" style=\"color: rgb(187, 187, 187);\">https:\u002F\u002Fwww.aibase.com\u002Fzh\u002Fnews\u002F20470\u003C\u002Fa>\u003C\u002Fp>\u003Cp class=\"ql-align-justify\">\u003Cspan style=\"color: rgb(187, 187, 187);\">（本网转发此文章，旨在为读者提供更多的信息资讯，所涉内容不构成投资、消费建议。文章事实如有疑问，请与有关方核实，文章观点非本网观点，仅供读者参考。）\u003C\u002Fspan>\u003C\u002Fp>","","https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002Ff363f4e0e774440895c12b50e585e2a1\u002FAI领域.jpg","https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002Fthumbs\u002Ff363f4e0e774440895c12b50e585e2a1\u002FAI领域.jpg",0,1,218,"2025-08-13 18:41",2,false,{"id":17,"name":20,"enName":21},"芯位视野","Xinwei Vision","https:\u002F\u002Fxinwei-dev-test.oss-cn-shenzhen.aliyuncs.com\u002Fintelligent\u002Faudio%3A388557d6-e0a4-4a16-9c8d-4f43e7585c22%3A0.wav?Expires=1770035541&OSSAccessKeyId=LTAI5tNvY2RkKjZw4LLWsrPK&Signature=qGPGnNPEc0ekHFQrWuagrLzy2uM%3D",6805646,"388557d6-e0a4-4a16-9c8d-4f43e7585c22","2025-08-13 18:38","Salesforce and the University of Southern California launch CoAct-1: A hybrid approach combining code and GUI to push AI agent automation to new heights","\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">Salesforce and researchers from the University of Southern California have developed a breakthrough technology called CoAct-1, which aims to significantly enhance the ability of AI agents to perform complex tasks on computers by combining the advantages of coding and graphical user interface (GUI) operations. This hybrid approach is designed to overcome the fragility of traditional GUI agents, paving the way for more powerful and scalable automation.\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong style=\"font-size: 18px; color: rgb(255, 153, 0);\" class=\"ql-lineHeight-1-75\">Pain points of traditional AI agents: long tasks and mistaken clicks\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Existing computer AI agents typically rely on visual language models (VLMs) to perceive the screen and simulate mouse and keyboard operations. Although these \"click-based\" agents can perform various tasks, they often perform poorly when dealing with applications such as office productivity suites with dense menus and complex workflows. Researchers point out that a single mistaken click or misunderstanding of a UI element can lead to the failure of an entire task in these scenarios.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">To address this challenge, researchers have tried to enhance GUI agents using advanced planners, but this method still cannot solve operations that can be completed more directly and reliably through a few lines of code.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cimg alt=\"undefined\" src=\"https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002Fa6f11b4796284524bc2dc4204dfd04ba\u002F6389067902437090791722001.png\" width=\"undefined\" height=\"undefined\" style=\"display: block; margin: auto;\">\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong style=\"font-size: 18px; color: rgb(255, 153, 0);\" class=\"ql-lineHeight-1-75\">CoAct-1: A hybrid system with multi-agent collaboration\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">To address these limitations, the CoAct-1 system was developed. Its core idea is to \"combine the intuitive advantages of GUI operations with the precision, reliability, and efficiency of direct system interaction through code.\" The system is completed by a team of three specialized agents:\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Orchestrator\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">: As the central planner, it is responsible for breaking down the user's overall goal into subtasks and assigning them to the most suitable agent.\u003C\u002Fspan>\u003C\u002Fli>\u003C\u002Ful>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Programmer\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">: Responsible for writing and executing Python or Bash scripts, handling backend operations such as file management or data processing.\u003C\u002Fspan>\u003C\u002Fli>\u003C\u002Ful>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">GUI Operator\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">: Based on VLM, specifically handles front-end tasks requiring button clicks or interface navigation.\u003C\u002Fspan>\u003C\u002Fli>\u003C\u002Ful>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">This dynamic delegation mechanism enables CoAct-1 to strategically bypass inefficient GUI operations, opting instead for more robust and efficient code execution while retaining the necessity of visual interaction. The entire workflow is iterative, with each agent reporting back to the orchestrator after completing a subtask, allowing it to decide the next course of action.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cimg alt=\"undefined\" src=\"https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002Fa2567742e4d34229afdd30a572aa7c0a\u002F6389067905177100076417100.png\" width=\"undefined\" height=\"undefined\" style=\"display: block; margin: auto;\">\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong style=\"font-size: 18px; color: rgb(255, 153, 0);\" class=\"ql-lineHeight-1-75\">Performance leap: faster and more efficient\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Researchers tested CoAct-1 on the&nbsp;\u003C\u002Fspan>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">OSWorld\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">&nbsp;benchmark, which includes 369 real-world tasks across browsers, IDEs, and office applications. The results showed that CoAct-1 achieved a&nbsp;\u003C\u002Fspan>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">60.76% success rate\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">, setting a new high standard.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Especially in operating system-level tasks and multi-application workflows, CoAct-1's performance improvement was most significant. More importantly, the system's efficiency has also greatly improved, with an average of only&nbsp;\u003C\u002Fspan>\u003Cstrong style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">10.15 steps\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">&nbsp;needed to complete a task, far fewer than the 15.22 steps required by other leading pure GUI agents. Researchers pointed out that fewer steps not only speed up task completion but also minimize the chances of errors, thus achieving more efficient and reliable automation.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong style=\"font-size: 18px; color: rgb(255, 153, 0);\" class=\"ql-lineHeight-1-75\">From lab to enterprise: potential applications and challenges\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">This technology has great potential for enterprise applications. Ran Xu, Director of Salesforce's AI Research, pointed out that customer support, sales exploration, automated bookkeeping, and marketing campaign management are all ideal use cases. In these scenarios, enterprises need to handle a variety of tools with and without APIs, and CoAct-1 can flexibly utilize code and screens to provide comprehensive automation solutions.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">However, moving CoAct-1 from the laboratory to the enterprise environment also faces challenges, including dealing with legacy software, ensuring security, and the necessity of human supervision. Xu emphasized that training in sandbox environments is needed to improve the agent's adaptability, and strong access control and security safeguards must be established to prevent malicious code execution. Ultimately, in the foreseeable future, the \"human-in-the-loop\" model will be key to ensuring the safe and reliable operation of the agent.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"color: rgb(187, 187, 187);\">[News Source] AIbase Base \u003C\u002Fspan>\u003Ca href=\"https:\u002F\u002Fwww.aibase.com\u002Fzh\u002Fnews\u002F20470\" rel=\"noopener noreferrer\" target=\"_blank\" style=\"color: rgb(187, 187, 187);\">https:\u002F\u002Fwww.aibase.com\u002Fzh\u002Fnews\u002F20470\u003C\u002Fa>\u003C\u002Fp>\u003Cp class=\"ql-align-justify\">\u003Cspan style=\"color: rgb(187, 187, 187);\">（This article is reprinted by this website to provide readers with more information. The content does not constitute investment or consumption advice. If there are any questions about the facts in the article, please verify with the relevant parties. The views expressed in the article are not the views of this website and are for reference only.）\u003C\u002Fspan>\u003C\u002Fp>","https:\u002F\u002Fxinwei-dev-test.oss-cn-shenzhen.aliyuncs.com\u002Fintelligent\u002Faudio%3Acf164cf6-0332-4f2a-bd2b-646616bd11b4%3A0.wav?Expires=1774838497&OSSAccessKeyId=LTAI5tNvY2RkKjZw4LLWsrPK&Signature=3FfYUgTR4%2Ba%2FgslUCj7Jn%2FS6jP4%3D","cf164cf6-0332-4f2a-bd2b-646616bd11b4",8380590]