[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$fOMyBvbem-GvkKIHnXQSNUJv25JdPD1RO64gCx3dQsVo":3},{"code":4,"msg":5,"data":6},200,"操作成功",{"id":7,"title":8,"content":9,"digest":10,"source":10,"coverPath":11,"thumbsCoverPath":12,"isTop":13,"isShow":14,"baseClick":13,"clickCount":15,"createTime":16,"typeId":17,"isNewest":18,"newsInfoTypeRespVo":19,"voiceUrl":22,"voiceSize":23,"taskId":24,"releaseTime":25,"titleEn":26,"contentEn":27,"voiceUrlEn":28,"taskIdEn":29,"voiceSizeEn":30},1256,"UC Berkeley团队突破AI内存瓶颈：让大模型推理快7倍的神奇方法","\u003Cimg alt=\"undefined\" src=\"https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002F1eabae3f74f14bbbb2ac90b59d1a80bb\u002FAA1KNioN.png\" width=\"undefined\" height=\"undefined\" style=\"display: block; margin: auto;\">\u003Cp class=\"ql-align-center\">\u003Cspan style=\"color: rgb(187, 187, 187);\">这项突破性研究来自加州大学伯克利分校、FuriosaAI、国际计算机科学研究所以及劳伦斯伯克利国家实验室的联合团队，由Aditya Tomar、Coleman Hooper等研究人员共同完成，于2025年8月14日发表在arXiv预印本平台上，论文编号为arXiv:2508.10395v1。有兴趣深入了解的读者可以通过该编号在arXiv官网上访问完整论文。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">当你打开手机里的ChatGPT或其他AI助手时，有没有想过为什么有时候它们反应会变慢？特别是在处理长篇对话或复杂任务时，这些原本聪明的AI似乎突然变得迟钝起来。背后的原因其实很简单：就像一个人试图在极其狭小的工作台上处理大量文件一样，AI的\"工作台\"——也就是内存空间——实在太小了。\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这个问题在AI领域被称为\"内存墙\"困境。随着AI模型变得越来越强大，它们需要记住的信息也越来越多，但计算机硬件的内存增长速度远远跟不上AI的胃口。就好比你有一台超级跑车的引擎，但油箱却只有摩托车那么大，再强劲的动力也发挥不出来。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px;\">伯克利团队的这项研究提出了一个巧妙的解决方案——XQUANT。\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这就像是给AI配备了一套高效的\"文件压缩和快速还原系统\"。传统方法会把AI需要记住的所有信息都原封不动地存储起来，占用大量宝贵的内存空间。而XQUANT采用了一种更聪明的策略：它选择存储更容易压缩的\"原始材料\"，然后在需要时快速\"重新制作\"出所需的信息。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">具体来说，当AI处理文本时，它会产生两种重要的中间信息：Keys（键值）和Values（数值），这些就像是理解文本含义的\"密码本\"。传统方法会把这两套密码本都存储起来，但XQUANT发现了一个窍门：与其存储这两套复杂的密码本，不如存储制作它们的\"原料\"——也就是输入激活X。这种原料不仅占用空间更小，压缩起来也更容易，就像存储面粉和鸡蛋比存储做好的蛋糕更节省冰箱空间一样。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">研究团队在测试中发现了一个有趣的现象：AI模型的不同层之间，这些\"原料\"竟然非常相似。这就像连续几天的天气预报，虽然每天都有细微差别，但整体趋势是相近的。基于这个发现，他们开发出了XQUANT-CL（跨层版本），这个升级版本能够识别并利用这种相似性，进一步压缩存储需求。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">在实际测试中，XQUANT的表现令人印象深刻。在不同的AI模型上，包括广受欢迎的Llama系列和Mistral模型，这种方法能够将内存使用量减少到原来的1\u002F7.7，同时几乎不影响AI的回答质量。更令人惊喜的是，XQUANT-CL版本甚至能实现高达12.5倍的内存节省，而AI的表现质量只下降了微不足道的0.1个百分点。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">研究团队特别考虑到了现代AI模型的特殊结构。许多最新的模型使用了一种叫做\"分组查询注意力\"（GQA）的技术，这就像是让几个人共享同一份笔记来提高效率。针对这种结构，研究团队开发了专门的优化方案，使用数学中的奇异值分解技术来进一步压缩信息，确保即使在这种复杂结构下，XQUANT也能发挥出色的效果。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">为了验证这种方法的实用性，研究团队进行了详细的性能分析。他们考虑了一个重要问题：虽然XQUANT节省了内存，但它需要在使用时重新计算一些信息，这会不会反而拖慢整体速度？答案是否定的。现代GPU的计算能力增长速度远超内存带宽的提升，就像有一个动力十足的厨师但厨房的储物空间有限，这种情况下用时间换空间反而是更明智的选择。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">以NVIDIA H100这样的高端GPU为例，研究团队计算出，对于长度达到2300个词汇的文本处理任务，使用XQUANT不会成为计算瓶颈。而对于新一代的Llama-3.1-8B模型，这个数字更是高达40600个词汇，足以处理一本中等长度的小说。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">在实验验证方面，研究团队在多个标准测试集上进行了全面评估。无论是传统的文本理解任务还是长篇文档问答，XQUANT都表现出了优异的性能。特别值得一提的是，在一些复杂的推理任务中，XQUANT甚至略微超越了传统方法的表现，这说明适度的信息压缩有时反而能帮助AI更好地抓住重点。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这项研究的意义远不止于技术层面的突破。随着AI应用越来越普及，从手机助手到自动驾驶汽车，内存效率的提升意味着更多设备能够运行更强大的AI模型，而不需要昂贵的硬件升级。对于普通用户而言，这可能意味着更快的响应速度、更长的对话记忆，以及在移动设备上也能享受到高质量的AI服务。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">更重要的是，这种方法为未来AI技术的发展指明了一个新方向。传统上，提升AI性能往往需要更多的计算资源和存储空间，但XQUANT证明了通过巧妙的算法设计，我们可以在有限的资源下实现更好的性能。这种\"用智慧替代蛮力\"的思路，对于推动AI技术的可持续发展具有重要意义。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">研究团队也坦承了这种方法的局限性。由于需要实时重新计算某些信息，XQUANT在某些特定的硬件配置下可能不是最优选择。此外，XQUANT-CL版本虽然节省了更多内存，但也需要额外的计算和存储操作来管理累积器，这在某些内存极度受限的场景下可能成为考虑因素。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">不过，考虑到计算硬件发展的总体趋势——计算能力的增长持续超越内存容量和带宽的提升——XQUANT代表了一种面向未来的解决方案。它不是简单地要求更多的硬件资源，而是通过算法创新来更有效地利用现有资源。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这项研究还揭示了一个有趣的技术哲学问题：在追求AI性能的道路上，我们是应该不断堆砌更强大的硬件，还是应该更多地依靠算法的巧思？XQUANT的成功表明，后者可能是一条更可持续、更有前景的道路。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">从更广阔的视角来看，这项研究反映了整个AI领域正在经历的一个重要转变：从粗放式的资源消耗向精细化的效率优化转变。就像工业革命后期，人们开始关注能源效率和环境影响一样，AI领域也在思考如何在有限的计算资源下实现最大的价值创造。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">对于那些关心AI技术发展但又担心其环境影响的人来说，XQUANT提供了一个令人鼓舞的例子：技术创新可以同时实现性能提升和资源节约。这种双赢的解决方案正是我们在面对全球计算资源日益紧张的今天最需要的。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">总的来说，伯克利团队的这项研究不仅解决了一个重要的技术问题，更为AI技术的未来发展提供了新的思路。它告诉我们，在追求更强大AI的道路上，聪明的算法设计往往比简单的硬件堆砌更有价值，而这种智慧最终会让更多人受益于AI技术的进步。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px;\">Q&amp;A\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q1：XQUANT是什么？它是如何节省AI内存的？\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A：XQUANT是加州大学伯克利分校开发的AI内存优化技术。它不直接存储AI处理过程中产生的Keys和Values信息，而是存储更容易压缩的原始输入数据X，然后在需要时重新计算出Keys和Values。这就像存储制作蛋糕的原料而不是成品蛋糕，能节省50%以上的存储空间。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q2：XQUANT会不会影响AI的回答质量？\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A：几乎不会。在测试中，XQUANT将内存使用量减少到1\u002F7.7的同时，AI的性能质量只下降了不到0.1个百分点。升级版的XQUANT-CL甚至能实现12.5倍的内存节省，质量下降仍然微不足道，有时甚至略有提升。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q3：这项技术什么时候能应用到我们日常使用的AI产品中？\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A：这项技术已经在学术层面得到验证，正在向产业化推进。考虑到现代GPU硬件的发展趋势（计算能力增长超过内存增长），XQUANT特别适合未来几代的AI硬件。预计在不久的将来，我们就能在手机和其他设备上体验到更快、更高效的AI服务。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"color: rgb(187, 187, 187);\">【新闻来源】科技行者 \u003C\u002Fspan>\u003Ca href=\"https:\u002F\u002Fwww.msn.cn\u002Fzh-cn\u002Fnews\u002Fother\u002Fuc-berkeley%E5%9B%A2%E9%98%9F%E7%AA%81%E7%A0%B4ai%E5%86%85%E5%AD%98%E7%93%B6%E9%A2%88-%E8%AE%A9%E5%A4%A7%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86%E5%BF%AB7%E5%80%8D%E7%9A%84%E7%A5%9E%E5%A5%87%E6%96%B9%E6%B3%95\u002Far-AA1KN6r1?oci\" rel=\"noopener noreferrer\" target=\"_blank\" style=\"color: rgb(187, 187, 187);\">http:\u002F\u002Fu5a.cn\u002FAeDS8\u003C\u002Fa>\u003C\u002Fp>\u003Cp class=\"ql-align-justify\">\u003Cspan style=\"color: rgb(187, 187, 187);\">（本网转发此文章，旨在为读者提供更多的信息资讯，所涉内容不构成投资、消费建议。文章事实如有疑问，请与有关方核实，文章观点非本网观点，仅供读者参考。）\u003C\u002Fspan>\u003C\u002Fp>","","https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002F6d47d7fdecbd44c5a3c705d61448d07d\u002FAI领域.jpg","https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002Fthumbs\u002F6d47d7fdecbd44c5a3c705d61448d07d\u002FAI领域.jpg",0,1,222,"2025-08-21 18:39",2,false,{"id":17,"name":20,"enName":21},"芯位视野","Xinwei Vision","https:\u002F\u002Fxinwei-dev-test.oss-cn-shenzhen.aliyuncs.com\u002Fintelligent\u002Faudio%3A4993798c-86ee-43e3-a221-355b02b073aa%3A0.wav?Expires=1755780152&OSSAccessKeyId=LTAI5tNvY2RkKjZw4LLWsrPK&Signature=bwn6oD8upAxcjw40FJW%2BFb%2B2B0c%3D",15173366,"4993798c-86ee-43e3-a221-355b02b073aa","2025-08-21 18:34","UC Berkeley team breaks AI memory bottleneck: A magical method that makes large model reasoning 7 times faster","\u003Cimg alt=\"undefined\" src=\"https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002F1eabae3f74f14bbbb2ac90b59d1a80bb\u002FAA1KNioN.png\" width=\"undefined\" height=\"undefined\" style=\"display: block; margin: auto;\">\u003Cp class=\"ql-align-center\">\u003Cspan style=\"color: rgb(187, 187, 187);\">This groundbreaking research comes from a joint team at the University of California, Berkeley, FuriosaAI, the International Computer Science Research Institute, and the Lawrence Berkeley National Laboratory, completed by researchers including Aditya Tomar and Coleman Hooper, and was published on the arXiv preprint platform on August 14, 2025, with the paper number arXiv:2508.10395v1. Readers interested in learning more can access the full paper on the arXiv website using this number.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">When you open ChatGPT or other AI assistants on your phone, have you ever wondered why they sometimes respond slowly? Especially when processing long conversations or complex tasks, these originally smart AI seem to suddenly become sluggish. The reason behind this is actually simple: just like a person trying to handle a large amount of documents on an extremely small workbench, the \"workbench\" of AI - which is the memory space - is simply too small.\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">This problem in the AI field is called the \"memory wall\" dilemma. As AI models become increasingly powerful, they need to remember more information, but the growth of computer hardware memory is far behind the appetite of AI. It's like having a supercar engine but only a motorcycle-sized fuel tank; no matter how powerful the power, it cannot be fully utilized.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px;\">The Berkeley team's research proposes a clever solution - XQUANT.\u003C\u002Fstrong>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">It's like equipping AI with an efficient \"file compression and fast restoration system.\" Traditional methods store all the information that AI needs to remember intact, occupying a lot of precious memory space. However, XQUANT adopts a smarter strategy: it stores the \"raw materials\" that are easier to compress, and then quickly \"reproduce\" the required information when needed.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Specifically, when AI processes text, it generates two important intermediate pieces of information: Keys (key values) and Values (numerical values), which are like the \"codebooks\" for understanding the meaning of the text. Traditional methods store both sets of codebooks, but XQUANT found a trick: instead of storing these two complex codebooks, it stores the \"raw materials\" used to create them - the input activation X. This raw material not only occupies less space but is also easier to compress, just like storing flour and eggs takes up less refrigerator space than storing a finished cake.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">In testing, the research team discovered an interesting phenomenon: these \"raw materials\" were surprisingly similar between different layers of AI models. This is like consecutive days of weather forecasts, although each day has minor differences, the overall trend is similar. Based on this finding, they developed XQUANT-CL (the cross-layer version), an upgraded version that can identify and utilize this similarity, further reducing storage requirements.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">In practical testing, XQUANT performed impressively. On different AI models, including the popular Llama series and Mistral models, this method reduced memory usage to 1\u002F7.7, while almost not affecting the quality of AI answers. More surprisingly, the XQUANT-CL version could achieve up to 12.5 times memory savings, with AI performance quality dropping by a negligible 0.1 percentage point.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The research team specifically considered the unique structure of modern AI models. Many of the latest models use a technique called \"grouped query attention\" (GQA), which is like having several people share the same notes to improve efficiency. In response to this structure, the research team developed specialized optimization solutions, using singular value decomposition from mathematics to further compress information, ensuring that even under such complex structures, XQUANT can perform excellently.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">To verify the practicality of this method, the research team conducted detailed performance analysis. They considered an important question: although XQUANT saves memory, it requires recomputing some information during use, will this actually slow down the overall speed? The answer is no. Modern GPU computing power grows much faster than memory bandwidth, just like having a highly powered chef but limited kitchen storage space; in such a case, trading time for space is actually a wiser choice.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Take NVIDIA H100 as an example, the research team calculated that for text processing tasks of up to 2300 words, using XQUANT would not become a computational bottleneck. For the next-generation Llama-3.1-8B model, this number reaches as high as 40600 words, enough to process a medium-length novel.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">In terms of experimental validation, the research team conducted comprehensive evaluations on multiple standard test sets. Whether it was traditional text comprehension tasks or long document Q&A, XQUANT showed excellent performance. Particularly worth mentioning is that in some complex reasoning tasks, XQUANT even slightly surpassed the performance of traditional methods, indicating that moderate information compression can sometimes help AI better grasp the key points.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The significance of this research goes beyond technical breakthroughs. With the increasing popularity of AI applications, from mobile assistants to autonomous cars, improving memory efficiency means more devices can run more powerful AI models without expensive hardware upgrades. For ordinary users, this may mean faster response speeds, longer conversation memories, and high-quality AI services available even on mobile devices.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">More importantly, this method points to a new direction for the future development of AI technology. Traditionally, improving AI performance often requires more computing resources and storage space, but XQUANT proves that through clever algorithm design, we can achieve better performance within limited resources. This \"using wisdom instead of brute force\" approach holds significant importance for promoting the sustainable development of AI technology.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The research team also admitted the limitations of this method. Since some information needs to be recomputed in real-time, XQUANT may not be the optimal choice in certain specific hardware configurations. Additionally, although the XQUANT-CL version saves more memory, it also requires additional computing and storage operations to manage accumulators, which may become a consideration in some scenarios with extremely limited memory.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">However, considering the overall trend of computing hardware development - the continuous growth of computing power exceeds the increase in memory capacity and bandwidth - XQUANT represents a future-oriented solution. It does not simply require more hardware resources but uses algorithm innovation to more effectively utilize existing resources.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">This research also reveals an interesting technological philosophy: on the road to pursuing AI performance, should we keep piling up more powerful hardware, or should we rely more on algorithmic ingenuity? The success of XQUANT indicates that the latter may be a more sustainable and promising path.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">From a broader perspective, this research reflects an important transformation taking place in the entire AI field: a shift from resource-intensive consumption to refined efficiency optimization. Just as people began to focus on energy efficiency and environmental impact in the later stages of the Industrial Revolution, the AI field is also thinking about how to create maximum value within limited computing resources.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">For those concerned about AI technology development but worried about its environmental impact, XQUANT provides an encouraging example: technological innovation can achieve both performance improvement and resource conservation. This win-win solution is exactly what we need most today in the face of increasingly tight global computing resources.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">In summary, the research by the Berkeley team not only solves an important technical problem but also provides new ideas for the future development of AI technology. It tells us that in the pursuit of more powerful AI, clever algorithm design is often more valuable than simple hardware stacking, and this wisdom will ultimately benefit more people through the advancement of AI technology.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px;\">Q&amp;A\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q1: What is XQUANT? How does it save AI memory?\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A: XQUANT is an AI memory optimization technology developed by the University of California, Berkeley. Instead of directly storing the keys and values information generated during AI processing, it stores the original input data X that is easier to compress, and then recalculates the keys and values when needed. This is like storing the ingredients for making a cake instead of the finished cake, saving more than 50% of storage space.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q2: Does XQUANT affect the quality of AI answers?\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A: Almost not. In testing, XQUANT reduced memory usage to 1\u002F7.7 while the performance quality of AI only dropped by less than 0.1 percentage point. The upgraded version XQUANT-CL can even achieve 12.5 times memory savings, with the quality drop still being negligible, and sometimes even slightly improved.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q3: When will this technology be applied to our daily used AI products?\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A: This technology has already been validated at the academic level and is moving towards industrialization. Considering the development trend of modern GPU hardware (computing power growth exceeds memory growth), XQUANT is particularly suitable for the next generation of AI hardware. It is expected that in the near future, we will be able to experience faster and more efficient AI services on our phones and other devices.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"color: rgb(187, 187, 187);\">【News source】Tech Player \u003C\u002Fspan>\u003Ca href=\"https:\u002F\u002Fwww.msn.cn\u002Fzh-cn\u002Fnews\u002Fother\u002Fuc-berkeley%E5%9B%A2%E9%98%9F%E7%AA%81%E7%A0%B4ai%E5%86%85%E5%AD%98%E7%93%B6%E9%A2%88-%E8%AE%A8%E5%A4%A7%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86%E5%BF%AB7%E5%80%8D%E7%9A%84%E7%A5%9E%E5%A5%8B%E6%96%B9%E6%B3%95\u002Far-AA1KN6r1?oci\" rel=\"noopener noreferrer\" target=\"_blank\" style=\"color: rgb(187, 187, 187);\">http:\u002F\u002Fu5a.cn\u002FAeDS8\u003C\u002Fa>\u003C\u002Fp>\u003Cp class=\"ql-align-justify\">\u003Cspan style=\"color: rgb(187, 187, 187);\">（This article is reprinted by this site to provide readers with more information and news. The content involved does not constitute investment or consumption advice. If there are any questions about the facts of the article, please verify with the relevant parties. The views of the article are not the views of this site and are for reference only.)\u003C\u002Fspan>\u003C\u002Fp>","https:\u002F\u002Fxinwei-dev-test.oss-cn-shenzhen.aliyuncs.com\u002Fintelligent\u002Faudio%3A7066bd1d-6bec-45bd-8920-0ba99464ccbe%3A0.wav?Expires=1774838490&OSSAccessKeyId=LTAI5tNvY2RkKjZw4LLWsrPK&Signature=X6AViikDJ8fsP2kw0VNrlywK1ns%3D","7066bd1d-6bec-45bd-8920-0ba99464ccbe",17244090]