[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$fALOxQqihgWvCVxF61rhn9N19ImeMiKbGOUPCzaBOkI4":3},{"code":4,"msg":5,"data":6},200,"操作成功",{"id":7,"title":8,"content":9,"digest":10,"source":10,"coverPath":11,"thumbsCoverPath":12,"isTop":13,"isShow":14,"baseClick":13,"clickCount":15,"createTime":16,"typeId":17,"isNewest":18,"newsInfoTypeRespVo":19,"voiceUrl":22,"voiceSize":23,"taskId":24,"releaseTime":25,"titleEn":26,"contentEn":27,"voiceUrlEn":28,"taskIdEn":29,"voiceSizeEn":30},1209,"华中科大团队破解AI网页设计痛点：让机器像人类一样\"分块思考\"生成代码","\u003Cimg alt=\"undefined\" src=\"https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002F8eff09645ca84c5cb607d9a4548e80d4\u002FAA1Khud1.png\" width=\"undefined\" height=\"undefined\" style=\"display: block; margin: auto;\">\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这项由华中科技大学计算机科学与技术学院的桂艺、李振、张仲毅、王国豪等十三位研究者组成的团队完成的研究，发表于2025年8月3-7日的第31届ACM SIGKDD知识发现与数据挖掘大会(KDD 2025)。有兴趣深入了解技术细节的读者可以通过DOI链接https:\u002F\u002Fdoi.org\u002F10.1145\u002F3711896.3737016获取完整论文。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">当你在Instagram上看到一个精美的页面设计，能否让AI自动生成对应的网页代码？这听起来很简单，但实际上却是困扰程序员和AI研究者很久的难题。目前，超过75.8%的前端开发者都在使用AI工具来提高效率，但现有的AI在处理网页设计转换时，总是会\"丢失\"一些重要的布局信息，就像一个健忘的建筑师在建房时忘记了某些房间的具体位置。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px;\">研究团队发现了一个有趣的现象：当前最先进的多模态大语言模型，比如GPT-4V、Gemini等，在看到一张网页截图时，虽然能够生成相应的代码，但经常会把原本应该水平排列的元素误排成垂直排列，或者完全搞错了元素的相对位置。这就好比让人看着一张房间的照片来画平面图，结果把本该在客厅旁边的卧室画到了厨房后面。\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px;\">为了解决这个问题，华中科大的研究团队想到了一个巧妙的方法：既然AI在处理整张复杂图片时容易\"迷路\"，那为什么不先把图片切成小块，让AI一块一块地处理，然后再把结果拼接起来呢？这就像拼图游戏一样，先完成每个小区域，最后组成完整的图案。\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">他们把这种方法命名为LaTCoder，其中的\"LaT\"代表\"Layout-as-Thought\"（布局即思维），借鉴了人工智能领域著名的\"Chain-of-Thought\"（思维链）概念。就像人类解决复杂问题时会把大问题分解成小问题逐一解决一样，LaTCoder把复杂的网页设计分解成多个简单的图像块来处理。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">一、神奇的图片切割术：让AI看得更清楚\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">LaTCoder的第一步工作就像一个精明的裁缝，需要把一张完整的网页设计图精准地裁剪成若干个有意义的小块。这个过程听起来简单，实际上却需要相当的技巧。研究团队设计了一个专门的算法来寻找网页中的\"分割线\"。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这些分割线必须满足几个条件：首先，它们必须是单一颜色的直线，就像用尺子画出的标准线条；其次，相邻分割线之间的距离不能太近，避免把图片切得过于琐碎；最重要的是，这些分割线不能穿过任何文字区域，否则就会把一个完整的句子从中间劈开。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">为了确保文字的完整性，研究团队还整合了光学字符识别（OCR）技术。这就好比给算法装上了一双能够识别文字的眼睛，确保切割过程不会破坏任何文本内容。同时，为了提高效率，他们采用了网格采样技术，不需要逐个像素地扫描，而是按照固定间隔进行检查，就像走路时不需要测量每一步的精确距离，大步流星地前进即可。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">在实际操作中，算法会忽略图片边缘的几个像素点，因为边缘区域往往包含一些干扰信息。通过这种方法，一张复杂的网页设计图就被巧妙地分割成了多个独立的图像块，每个块都有明确的位置坐标（专业术语叫\"边界框\"或BBox），就像给每个房间标注了详细的地址。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">为了避免产生过小的无意义碎片，算法还会自动合并那些面积小于预设阈值的小块。这就像整理房间时，会把一些小物件归类到相邻的大容器中，保持整体的整洁和实用性。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">二、逐块生成：化整为零的编程魔法\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">完成图片分割后，LaTCoder开始了第二阶段的工作：逐个处理每个图像块。这个过程就像请一位经验丰富的程序员，针对每个小区域单独编写代码。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">研究团队精心设计了一套提示词系统，指导AI如何处理每个图像块。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这套提示词系统遵循几个重要原则。首先是使用统一的网页模板，就像所有的房间都使用相同的建筑标准，确保最终拼接时不会出现风格冲突。其次是优先考虑外观和布局的一致性，然后才关注内容的准确性，这就像装修房子时先确定整体风格，再添置具体的家具。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">最关键的是，这个过程采用了\"思维链\"的方法，让AI按步骤进行思考：先分析图像块的内容和结构，然后生成初始的HTML和CSS代码，接着检查文字内容、颜色搭配、背景样式等细节，最后进行整体优化。这个过程就像一个细心的手工艺人，先打草稿，再精雕细琢，最后进行最终的润色。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">研究团队还针对不同能力的AI模型提供了不同版本的提示词。对于上下文长度较短的较弱模型，比如DeepSeek-VL2，他们提供了简化版本的提示，避免超出模型的处理能力。这就像给不同年龄的学生提供不同难度的作业，确保每个人都能在自己的能力范围内完成任务。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">三、巧妙拼接：两种策略的完美结合\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">生成了所有图像块的代码后，LaTCoder面临最后一个挑战：如何把这些代码片段组装成完整的网页？研究团队开发了两种不同的拼接策略，就像准备了两种不同的拼图方法。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">第一种策略叫做\"绝对定位组装\"，这种方法就像在一张大画布上，根据每个图像块的原始坐标位置，精确地放置每个代码片段。这种方法的优点是位置绝对准确，不会出现偏差，特别适合那些上下文处理能力较弱的AI模型。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">第二种策略叫做\"基于AI的组装\"，这种方法更加灵活，让AI模型根据原始设计图和每个图像块的位置信息，智能地决定如何最好地组合这些代码片段。虽然这种方法对AI模型的能力要求更高，但往往能产生更加美观和自然的结果。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">为了选择最佳的组装结果，研究团队还开发了一个\"动态选择器\"。这个选择器就像一位经验丰富的评委，能够比较两种组装方法的结果，选择更好的那一个。评判标准结合了像素级别的准确性（通过平均绝对误差MAE测量）和语义级别的相似性（通过CLIP模型测量），确保选择的结果既在细节上准确，又在整体感观上协调。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">有趣的是，研究团队最初尝试让AI模型直接担任评委，但发现即使设计了精心的提示词，AI模型在图像相似性判断方面仍然不够可靠。因此，他们最终选择了传统的自动化评估指标，这个决定体现了科学研究中务实和严谨的态度。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">四、严格测试：新数据集的挑战\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">为了验证LaTCoder的效果，研究团队进行了全面的实验评估。他们不仅在现有的公开数据集Design2Code-HARD上进行了测试，还专门创建了一个更具挑战性的新数据集CC-HARD。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Design2Code-HARD虽然被认为是相对困难的测试集，但研究团队发现它的复杂性主要体现在文本长度上，而在布局和结构方面仍然相对简单。为了更好地测试AI模型在处理复杂网页布局方面的能力，他们从Common Crawl数据集中精心挑选了128个具有复杂布局的网页样本，构建了CC-HARD数据集。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">通过统计分析可以发现，CC-HARD数据集在布局复杂度方面确实更具挑战性。虽然两个数据集的总体长度相似，但CC-HARD中的代码标签数量更多（平均274个对比251个），DOM树深度更深（平均16层对比10层），独特标签类型也更丰富（平均27种对比23种）。这些数字表明CC-HARD更像是现实世界中复杂网站的真实写照。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">实验结果令人印象深刻。在Design2Code-HARD数据集上，当使用GPT-4o作为基础模型时，LaTCoder在TreeBLEU指标上提升了17.65%，在CLIP相似度上提升了1.27%，在视觉评分上提升了3.8%，平均绝对误差降低了37.41%。在更具挑战性的CC-HARD数据集上，改善幅度更加明显：TreeBLEU提升了60%，CLIP相似度提升了2.53%，视觉评分提升了2.56%，平均绝对误差降低了43.23%。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">特别值得注意的是，LaTCoder对较弱的AI模型帮助更大。当使用DeepSeek-VL2模型时，在CC-HARD数据集上的TreeBLEU指标提升了66.67%，平均绝对误差降低了38%。这说明LaTCoder的\"分而治之\"策略确实能够有效减轻AI模型的处理负担，让它们在力所能及的范围内发挥更好的效果。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">五、人类评判：真实用户的反馈\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">除了自动化的评估指标，研究团队还进行了人类评判实验。他们邀请了六位标注人员，让他们比较LaTCoder生成的网页与其他基准方法的结果，并投票选择更接近原始设计的版本。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">评判过程采用了配对比较的方式，每次向评判者展示原始设计图和两个不同方法生成的网页，询问\"哪一个更接近设计图并且质量更高？\"为了减少主观性的影响，研究团队采用了多数投票的机制来确定最终结果。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">人类评判的结果进一步验证了LaTCoder的有效性。在与各个基准方法的比较中，人类评判者在至少60%的情况下更偏好LaTCoder的结果。特别是在与DCGen方法的比较中，LaTCoder的胜率达到了79.7%，这个差距相当明显。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这种人类评判实验的价值在于，它反映了真实用户的主观感受，而不仅仅是数字化的客观指标。毕竟，网页设计的最终目标是让人类用户满意，因此人类评判者的偏好具有重要的参考价值。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">六、深入分析：成功的秘密与局限性\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">研究团队还进行了详细的消融研究（ablation study），试图理解LaTCoder成功的关键因素。他们发现，\"思维链\"式的提示设计对最终效果有重要影响。当使用简化的提示词时，各项性能指标都出现了明显下降，这说明让AI按步骤思考确实有助于提高代码生成的质量。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">关于两种组装策略的比较，研究发现它们各有优势。绝对定位组装在位置准确性方面表现更好，特别是在平均绝对误差这个指标上优势明显。这是因为绝对定位严格保持了每个图像块的原始位置，避免了任何可能的偏移。而基于AI的组装虽然在位置精度上稍逊一筹，但往往能产生更加自然和美观的整体效果，各个部分之间的过渡更加平滑。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">研究团队还测试了不同规模AI模型的表现。有趣的是，LaTCoder对较小规模的模型帮助更加显著。比如对于DeepSeek-VL2-tiny模型，性能提升幅度达到了175%，而对于更大的模型，虽然绝对性能更高，但相对提升幅度较小。这个发现很有实际意义，因为它意味着即使使用计算资源有限的小模型，通过LaTCoder也能获得不错的效果。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">当然，LaTCoder也有一些局限性。研究团队诚实地指出了两个主要问题。首先是布局错误，即使分块处理，AI模型有时仍会错误地安排某个块内部元素的位置，比如把应该在顶部的内容放到了底部。其次是\"偷懒\"问题，某些AI模型（特别是Gemini）在组装代码时有时会省略一些代码片段，导致最终网页缺少某些区域。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">七、技术创新的深层意义\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">LaTCoder的创新不仅仅在于解决了一个具体的技术问题，更重要的是它提供了一种新的思维方式来处理AI的局限性。当我们发现AI在处理复杂任务时容易出错，与其试图让AI变得更强大，不如巧妙地分解任务，让AI在其能力范围内发挥最佳效果。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">这种\"分而治之\"的策略在很多领域都有应用前景。比如在自动化软件测试中，可以把复杂的系统分解成多个模块分别测试；在自然语言处理中，可以把长文档分段处理后再组合；在图像处理中，可以把大图片分块处理以提高精度。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">从更宏观的角度看，LaTCoder体现了人工智能发展的一个重要趋势：不是简单地追求更大更强的模型，而是通过巧妙的工程设计来充分发挥现有模型的潜力。这种方法不仅更加经济高效，而且更容易在实际应用中部署和维护。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">研究团队还创建的CC-HARD数据集为整个研究社区提供了一个更具挑战性的测试平台。这个数据集的公开发布将有助于推动整个领域的进步，让更多研究者能够在更接近真实世界复杂度的环境中测试他们的方法。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">LaTCoder的成功也为前端开发行业带来了实际价值。随着这种技术的成熟和普及，设计师和开发者之间的协作将变得更加高效。设计师可以专注于创意和用户体验，而繁重的代码实现工作可以更多地交给AI助手。这不会完全取代程序员，但会让他们能够把更多精力投入到更有创造性的工作中。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">说到底，LaTCoder解决的是一个看似简单实则复杂的问题：如何让机器更好地理解和重现人类的设计意图。通过借鉴人类解决复杂问题的思维方式——分步骤、分区域、逐个击破，这项研究为AI与创意设计的结合开辟了新的可能性。虽然目前的技术还不能完全替代人类设计师和程序员，但它已经展现出了作为强大辅助工具的潜力。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">对于普通用户而言，这意味着在不久的将来，创建专业网页可能会变得像使用PPT一样简单。你只需要画出或者找到一个喜欢的网页设计，AI就能帮你生成相应的代码，大大降低了网页开发的门槛。这种技术民主化的趋势，最终将让更多人能够参与到数字内容的创造中来。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px;\">Q&amp;A\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q1：LaTCoder是什么？它是怎么工作的？\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A：LaTCoder是华中科技大学研究团队开发的一种AI网页代码生成方法。它的工作原理就像拼图一样：先把完整的网页设计图切割成多个小块，让AI分别为每个小块生成代码，最后再把所有代码片段拼接成完整的网页。这种\"分而治之\"的方法能够显著提高AI生成代码的准确性，特别是在保持原始设计布局方面。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q2：LaTCoder比现有的AI代码生成方法好在哪里？\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A：LaTCoder的主要优势是能够更好地保持网页的原始布局。传统方法让AI一次性处理整张设计图时，经常会搞错元素的位置关系，比如把应该水平排列的元素排成垂直的。LaTCoder通过分块处理避免了这个问题，实验显示它在多个评估指标上都有显著提升，特别是在复杂布局的处理上表现更佳。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q3：普通用户什么时候能使用LaTCoder技术？\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A：目前LaTCoder还处于研究阶段，研究团队已经在GitHub上开源了相关代码和数据集。虽然普通用户暂时还不能直接使用成熟的产品，但这项技术为未来的网页开发工具奠定了基础。预计随着技术的进一步完善，类似的功能将会集成到各种设计和开发工具中，让网页制作变得更加简单易用。\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"color: rgb(187, 187, 187);\">【新闻来源】科技行者 \u003C\u002Fspan>\u003Ca href=\"https:\u002F\u002Fwww.msn.cn\u002Fzh-cn\u002Fnews\u002Fother\u002F%E5%8D%8E%E4%B8%AD%E7%A7%91%E5%A4%A7%E5%9B%A2%E9%98%9F%E7%A0%B4%E8%A7%A3ai%E7%BD%91%E9%A1%B5%E8%AE%BE%E8%AE%A1%E7%97%9B%E7%82%B9-%E8%AE%A9%E6%9C%BA%E5%99%A8%E5%83%8F%E4%BA%BA%E7%B1%BB%E4%B8%80%E6%A0%B7-%E5%88%86%E5%9D%97%E6%80%9D%E8%80%83-%E7%94%9F%E6%88%90%E4%BB%A3%E7%A0%81\u002Far-AA1Khud4?ocid=msedgntphdr&amp;cvid=67764a46dd014287b627707a75cbb518&amp;ei=50\" rel=\"noopener noreferrer\" target=\"_blank\" style=\"color: rgb(187, 187, 187);\">http:\u002F\u002Fu5a.cn\u002FWey8v\u003C\u002Fa>\u003C\u002Fp>\u003Cp class=\"ql-align-justify\">\u003Cspan style=\"color: rgb(187, 187, 187);\">（本网转发此文章，旨在为读者提供更多的信息资讯，所涉内容不构成投资、消费建议。文章事实如有疑问，请与有关方核实，文章观点非本网观点，仅供读者参考。）\u003C\u002Fspan>\u003C\u002Fp>","","https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002F7e0505be41744ac6a2ed5ac02e59f12f\u002FAI领域.jpg","https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002Fthumbs\u002F7e0505be41744ac6a2ed5ac02e59f12f\u002FAI领域.jpg",0,1,217,"2025-08-12 18:03",2,false,{"id":17,"name":20,"enName":21},"芯位视野","Xinwei Vision","https:\u002F\u002Fxinwei-dev-test.oss-cn-shenzhen.aliyuncs.com\u002Fintelligent\u002Faudio%3Ae6b86096-707f-44c6-b9e3-0917089b60f3%3A0.wav?Expires=1754998852&OSSAccessKeyId=LTAI5tNvY2RkKjZw4LLWsrPK&Signature=G5XzmsBO0TSrE71nwt%2FcpdModoo%3D",27383024,"e6b86096-707f-44c6-b9e3-0917089b60f3","2025-08-12 17:50","Team from Huazhong University of Science and Technology cracks the pain points of AI web design: let the machine think in \"blocks\" like humans to generate code","\u003Cimg alt=\"undefined\" src=\"https:\u002F\u002Fimage.51xinwei.com\u002F2025\u002F08\u002F8eff09645ca84c5cb607d9a4548e80d4\u002FAA1Khud1.png\" width=\"undefined\" height=\"undefined\" style=\"display: block; margin: auto;\">\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">This research, completed by a team of thirteen researchers including Gui Yi, Li Zhen, Zhang Zhongyi, Wang Guohao from the School of Computer Science and Technology at Huazhong University of Science and Technology, was published at the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2025) from August 3 to 7, 2025. Readers interested in深入了解 the technical details can access the full paper through the DOI link https:\u002F\u002Fdoi.org\u002F10.1145\u002F3711896.3737016.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">When you see an exquisite page design on Instagram, can AI automatically generate the corresponding web code? This sounds simple, but it has been a long-standing problem for programmers and AI researchers. Currently, more than 75.8% of front-end developers are using AI tools to improve efficiency, but existing AI always \"loses\" some important layout information when processing web design conversion, just like a forgetful architect who forgets the specific location of certain rooms while building a house.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px;\">The research team discovered an interesting phenomenon: current state-of-the-art multimodal large language models, such as GPT-4V and Gemini, although they can generate corresponding code when seeing a web screenshot, often misplace elements that should be arranged horizontally into vertical arrangements or completely mess up the relative positions of elements. This is like asking someone to draw a floor plan based on a photo of a room, only to have them place the bedroom next to the living room in the kitchen.\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px;\">To solve this problem, the research team from Huazhong University of Science and Technology came up with a clever method: since AI tends to get lost when processing complex images, why not cut the image into small pieces, let AI process each piece one by one, and then assemble the results? This is like a jigsaw puzzle game, where each small area is completed first, and then the complete picture is formed.\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">They named this method LaTCoder, where \"LaT\" stands for \"Layout-as-Thought\" (layout as thought), drawing inspiration from the well-known concept of \"Chain-of-Thought\" (thought chain) in the field of artificial intelligence. Just like humans break down complex problems into smaller ones to solve them step by step, LaTCoder breaks down complex web designs into multiple simple image blocks for processing.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">1. The Magic of Image Cutting: Letting AI See More Clearly\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The first step of LaTCoder is like a shrewd tailor, needing to precisely cut a complete web design image into several meaningful small blocks. This process sounds simple, but actually requires considerable skill. The research team designed a specialized algorithm to find \"split lines\" in the web pages.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">These split lines must meet several conditions: first, they must be straight lines of a single color, like standard lines drawn with a ruler; second, the distance between adjacent split lines should not be too close, to avoid cutting the image into too many small fragments; most importantly, these split lines should not pass through any text areas, otherwise they would split a complete sentence in the middle.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">To ensure the integrity of the text, the research team also integrated optical character recognition (OCR) technology. This is like giving the algorithm eyes that can recognize text, ensuring that the cutting process does not destroy any text content. At the same time, to improve efficiency, they used grid sampling technology, without scanning every pixel individually, but checking at fixed intervals, like walking without measuring the exact distance of each step, but moving forward quickly.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">In practice, the algorithm ignores a few pixels at the edge of the image, because the edge areas often contain some distracting information. Through this method, a complex web design image is cleverly divided into multiple independent image blocks, each with clear position coordinates (known as \"bounding boxes\" or BBox), just like marking detailed addresses for each room.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">To avoid generating too small and meaningless fragments, the algorithm will automatically merge small blocks that are below a preset threshold. This is like organizing a room, where small items are grouped into adjacent larger containers, maintaining overall tidiness and practicality.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">2. Block-by-Block Generation: The Magical Programming of Breaking Down the Whole into Parts\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">After the image segmentation, LaTCoder begins its second phase of work: processing each image block individually. This process is like hiring an experienced programmer to write code for each small area separately.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The research team carefully designed a prompt system to guide the AI on how to handle each image block.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">This prompt system follows several important principles. First, it uses a unified web template, similar to all rooms using the same construction standards, ensuring no style conflicts when assembling at the end. Second, it prioritizes the consistency of appearance and layout before focusing on content accuracy, which is like determining the overall style when renovating a house before adding specific furniture.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Most importantly, this process adopts the \"chain-of-thought\" method, allowing the AI to think step by step: first analyze the content and structure of the image block, then generate initial HTML and CSS code, then check details such as text content, color matching, background style, and finally perform overall optimization. This process is like a careful artisan, first sketching, then refining, and finally doing the final polishing.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The research team also provided different versions of prompts for AI models of varying capabilities. For weaker models with shorter context lengths, such as DeepSeek-VL2, they provided simplified versions of the prompts to avoid exceeding the model's processing capacity. This is like providing different difficulty levels of homework for students of different ages, ensuring everyone can complete the task within their own ability range.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">3. Clever Assembly: A Perfect Combination of Two Strategies\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">After generating the code for all image blocks, LaTCoder faces the last challenge: how to assemble these code snippets into a complete web page? The research team developed two different assembly strategies, like preparing two different ways to assemble a puzzle.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The first strategy is called \"absolute positioning assembly,\" which is like placing each code snippet on a large canvas according to the original coordinate positions of each image block. The advantage of this method is that the position is absolutely accurate and will not deviate, especially suitable for AI models with weak contextual processing capabilities.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The second strategy is called \"AI-based assembly,\" which is more flexible, allowing the AI model to decide how best to combine these code snippets based on the original design and the position information of each image block. Although this method requires higher AI model capabilities, it often produces more beautiful and natural results.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">To choose the best assembly result, the research team also developed a \"dynamic selector.\" This selector is like an experienced judge, capable of comparing the results of the two assembly methods and choosing the better one. The evaluation criteria combine pixel-level accuracy (measured by mean absolute error MAE) and semantic-level similarity (measured by CLIP model), ensuring that the selected result is both accurate in detail and coordinated in overall visual appeal.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Interestingly, the research team initially tried letting the AI model act as the judge, but found that even with carefully designed prompts, the AI model still wasn't reliable enough in judging image similarity. Therefore, they ultimately chose traditional automated evaluation metrics, a decision that reflects the pragmatic and rigorous attitude of scientific research.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">4. Rigorous Testing: The Challenge of the New Dataset\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">To verify the effectiveness of LaTCoder, the research team conducted comprehensive experimental evaluations. They not only tested on the existing public dataset Design2Code-HARD, but also specifically created a more challenging new dataset CC-HARD.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Although Design2Code-HARD is considered relatively difficult, the research team found that its complexity mainly lies in text length, while the layout and structure are still relatively simple. To better test the ability of AI models in handling complex web layouts, they carefully selected 128 web samples with complex layouts from the Common Crawl dataset to build the CC-HARD dataset.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Statistical analysis shows that the CC-HARD dataset is indeed more challenging in terms of layout complexity. Although the total length of the two datasets is similar, CC-HARD has more code tags (an average of 274 compared to 251), deeper DOM tree depth (an average of 16 layers compared to 10 layers), and more diverse tag types (an average of 27 compared to 23). These numbers indicate that CC-HARD is a more realistic reflection of complex websites in the real world.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The experimental results were impressive. On the Design2Code-HARD dataset, when using GPT-4o as the base model, LaTCoder improved the TreeBLEU metric by 17.65%, the CLIP similarity by 1.27%, the visual score by 3.8%, and reduced the mean absolute error by 37.41%. On the more challenging CC-HARD dataset, the improvement was even more significant: TreeBLEU increased by 60%, CLIP similarity by 2.53%, visual score by 2.56%, and mean absolute error decreased by 43.23%.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">It is particularly worth noting that LaTCoder provides greater assistance to weaker AI models. When using the DeepSeek-VL2 model, the TreeBLEU metric on the CC-HARD dataset improved by 66.67%, and the mean absolute error decreased by 38%. This indicates that LaTCoder's \"divide and conquer\" strategy indeed effectively reduces the processing burden on AI models, enabling them to perform better within their capabilities.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">5. Human Evaluation: Feedback from Real Users\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">In addition to automated evaluation metrics, the research team also conducted human evaluation experiments. They invited six annotators to compare the web pages generated by LaTCoder with those from other baseline methods and vote for the version that is closer to the original design.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The evaluation process used a paired comparison approach, showing each evaluator the original design and two different methods' generated web pages, asking \"Which one is closer to the design and of higher quality?\" To reduce the impact of subjectivity, the research team used a majority voting mechanism to determine the final result.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The results of the human evaluation further validated the effectiveness of LaTCoder. In comparisons with various baseline methods, human evaluators preferred LaTCoder's results in at least 60% of cases. Particularly in the comparison with the DCGen method, LaTCoder achieved a win rate of 79.7%, a significant gap.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The value of this human evaluation experiment lies in reflecting the subjective feelings of real users, rather than just digital objective metrics. After all, the ultimate goal of web design is to satisfy human users, so the preferences of human evaluators have important reference value.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">6. In-depth Analysis: The Secrets of Success and Limitations\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The research team also conducted detailed ablation studies to understand the key factors behind LaTCoder's success. They found that the \"chain-of-thought\" style prompt design had a significant impact on the final results. When using simplified prompts, all performance metrics showed a noticeable decline, indicating that guiding the AI to think step by step indeed helps improve the quality of code generation.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Regarding the comparison of the two assembly strategies, the study found that they each have their advantages. Absolute positioning assembly performed better in position accuracy, especially in the mean absolute error indicator. This is because absolute positioning strictly maintains the original position of each image block, avoiding any possible deviation. While AI-based assembly is slightly less precise in position accuracy, it often produces more natural and aesthetically pleasing overall results, with smoother transitions between parts.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The research team also tested the performance of different-sized AI models. Interestingly, LaTCoder provided more significant help for smaller models. For example, for the DeepSeek-VL2-tiny model, the performance improvement reached 175%, while for larger models, although the absolute performance was higher, the relative improvement was smaller. This finding has practical significance, as it means that even with limited computational resources, small models can achieve good results through LaTCoder.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Of course, LaTCoder also has some limitations. The research team honestly pointed out two main issues. First, layout errors: even after block processing, the AI model sometimes incorrectly arranges the internal elements of a block, such as placing content that should be at the top at the bottom. Second, the \"laziness\" issue: some AI models (especially Gemini) sometimes omit some code segments when assembling, leading to missing areas in the final web page.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px; color: rgb(255, 153, 0);\">7. The Deeper Significance of Technological Innovation\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The innovation of LaTCoder is not only about solving a specific technical problem, but more importantly, it offers a new way of thinking to address the limitations of AI. When we find that AI is prone to errors when dealing with complex tasks, instead of trying to make AI stronger, we can cleverly decompose the tasks and let AI perform at its best within its capabilities.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">This \"divide and conquer\" strategy has application potential in many fields. For example, in automated software testing, complex systems can be broken down into multiple modules for individual testing; in natural language processing, long documents can be processed in segments and then combined; in image processing, large images can be processed in blocks to improve precision.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">From a broader perspective, LaTCoder reflects an important trend in the development of artificial intelligence: not simply pursuing larger and stronger models, but leveraging clever engineering design to fully leverage the potential of existing models. This approach is not only more economical and efficient but also easier to deploy and maintain in practical applications.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The research team also created the CC-HARD dataset, providing the entire research community with a more challenging test platform. The public release of this dataset will help promote progress in the entire field, allowing more researchers to test their methods in environments closer to real-world complexity.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">The success of LaTCoder also brings practical value to the front-end development industry. As this technology matures and becomes widespread, collaboration between designers and developers will become more efficient. Designers can focus on creativity and user experience, while the tedious code implementation work can be more frequently handled by AI assistants. This will not completely replace programmers, but will allow them to focus more on creative work.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Ultimately, LaTCoder solves a seemingly simple yet complex problem: how to enable machines to better understand and reproduce human design intent. By borrowing the way humans solve complex problems—step by step, area by area, and tackling each issue one by one—this research opens up new possibilities for the integration of AI and creative design. Although current technology cannot completely replace human designers and programmers, it has already shown the potential of being a powerful assistant tool.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">For ordinary users, this means that creating professional websites may soon become as simple as using PowerPoint. You just need to draw or find a favorite website design, and AI can help you generate the corresponding code, greatly lowering the barrier to web development. This trend of technological democratization will eventually allow more people to participate in the creation of digital content.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cstrong class=\"ql-lineHeight-1-75\" style=\"font-size: 18px;\">Q&amp;A\u003C\u002Fstrong>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q1: What is LaTCoder? How does it work?\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A: LaTCoder is an AI web code generation method developed by the research team from Huazhong University of Science and Technology. Its working principle is like a jigsaw puzzle: first, the complete web design image is cut into multiple small blocks, and then AI generates code for each small block individually. Finally, all the code snippets are assembled into a complete web page. This \"divide and conquer\" method can significantly improve the accuracy of AI-generated code, especially in maintaining the original design layout.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q2: What makes LaTCoder better than existing AI code generation methods?\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A: The main advantage of LaTCoder is its ability to better maintain the original layout of the web page. Traditional methods let AI process the entire design image at once, often causing mistakes in the relationship between elements, such as arranging elements that should be horizontal vertically. LaTCoder avoids this problem by processing in blocks. Experiments show that it has significant improvements in multiple evaluation metrics, especially in handling complex layouts.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">Q3: When can ordinary users use LaTCoder technology?\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"font-size: 18px;\" class=\"ql-lineHeight-1-75\">A: Currently, LaTCoder is still in the research stage, and the research team has already open-sourced the related code and dataset on GitHub. Although ordinary users cannot directly use mature products for now, this technology lays the foundation for future web development tools. It is expected that as the technology continues to improve, similar features will be integrated into various design and development tools, making web creation more simple and easy to use.\u003C\u002Fspan>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cbr>\u003C\u002Fp>\u003Cp>\u003Cspan style=\"color: rgb(187, 187, 187);\">【News Source】 Tech Player \u003C\u002Fspan>\u003Ca href=\"https:\u002F\u002Fwww.msn.cn\u002Fzh-cn\u002Fnews\u002Fother\u002F%E5%8D%8E%E4%B8%AD%E7%A7%91%E5%A4%A7%E5%9B%A2%E9%98%9F%E7%A0%B4%E8%A7%A3ai%E7%BD%91%E9%A1%B5%E8%AE%BE%E8%AE%A1%E7%97%9B%E7%82%B9-%E8%AE%A9%E6%9C%BA%E5%99%A8%E5%83%8F%E4%BA%BA%E7%B1%BB%E4%B8%80%E6%A0%B7-%E5%88%86%E5%9D%97%E6%80%95%E8%80%83-%E7%94%9F%E6%88%90%E4%BB%A3%E7%A0%81\u002Far-AA1Khud4?ocid=msedgntphdr&amp;cvid=67764a46dd014287b627707a75cbb518&amp;ei=50\" rel=\"noopener noreferrer\" target=\"_blank\" style=\"color: rgb(187, 187, 187);\">http:\u002F\u002Fu5a.cn\u002FWey8v\u003C\u002Fa>\u003C\u002Fp>\u003Cp class=\"ql-align-justify\">\u003Cspan style=\"color: rgb(187, 187, 187);\">（This article is reprinted by the network to provide readers with more information and news. The content mentioned does not constitute investment or consumption advice. If there are any doubts about the facts of the article, please verify with the relevant parties. The views of the article are not the views of the network, and are for reference only.）\u003C\u002Fspan>\u003C\u002Fp>","https:\u002F\u002Fxinwei-dev-test.oss-cn-shenzhen.aliyuncs.com\u002Fintelligent\u002Faudio%3Aa1fc1f62-a82b-4498-82c5-bb06c3e318c7%3A0.wav?Expires=1774838498&OSSAccessKeyId=LTAI5tNvY2RkKjZw4LLWsrPK&Signature=mnSNW%2FaC7P4RnXexCvEcPUPzAiQ%3D","a1fc1f62-a82b-4498-82c5-bb06c3e318c7",17856162]