中国大模型“卷技术”！DeepSeek前脚发布NSA，Kimi立刻跟进MoBA

华尔街见闻

Feb 19, 2025

周二，当全球目光聚焦于马斯克Grok-3的庞大GPU集群时，中国大模型公司正在技术创新的道路上默默加速。先是DeepSeek提出了原生稀疏注意力（Native Sparse Attention, NSA）机制。这项梁文锋亲自参与的研究成果，结合了算法创新和硬件优化，旨在解决长上下文建模中的计算瓶颈。 NSA不仅能将大语言模型处理64k长文本的速度最高提升11.6倍，更在通用基准测试中实现了对传统...

Source Link

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

{"basename":"","ssrTDKData":{"titleTemplate":"%s - Tiger Brokers","title":"Tiger Brokers | Global Stocks, Options & Futures Trading App","description":"Tiger Brokers, one-stop investment in US stocks, SGX stocks, HK stocks, A-shares & other global assets. One of the best stock trading platforms in Singapore.","keywords":"tiger brokers,tiger trade,tiger brokers singapore,broker online,stock trading in singapore,share trading singapore,brokerage firm singapore,trading app,stock broker singapore,stock trading platforms,trading account","social":{"ogDescription":"Tiger Brokers, one-stop investment in US stocks, SGX stocks, HK stocks, A-shares & other global assets. One of the best stock trading platforms in Singapore.","ogImage":"https://c1.itigergrowtha.com/portal5/static/media/og-logo.be62fbe1.png","ogUrl":"https://www.itiger.com/news/2512413661"},"companyName":"Tiger Brokers"},"pageData":{"isMobile":false,"isTiger":false,"isTTM":true,"region":"SGP","license":"TBSG","edition":"fundamental"},"isCrawlerRequest":true,"__swrFallback__":{"@#url:\"https://stock-news.skytigris.cn/v3/news\",params:#id:\"2512413661\",edition:\"fundamental\",auth_exemption:1,,,undefined,":{"share":"https://ttm.financial/m/news/2512413661?lang=en_US&edition=fundamental","thumbnail":"https://wpimg-wscn.awtmt.com/6a52a6b7-e356-436d-9939-d8f34ff949a7.png","is_english":false,"pubTime":"2025-02-19 11:18","share_image_url":"https://static.laohu8.com/e9f99090a1c2ed51c021029395664489","id":"2512413661","market":"us","top_or_hot":-1,"title":"中国大模型“卷技术”！DeepSeek前脚发布NSA，Kimi立刻跟进MoBA","media":"华尔街见闻","content":"<div>\n<p>周二，当全球目光聚焦于马斯克Grok-3的庞大GPU集群时，中国大模型公司正在技术创新的道路上默默加速。\n先是DeepSeek提出了原生稀疏注意力（Native Sparse Attention, NSA）机制。这项梁文锋亲自参与的研究成果，结合了算法创新和硬件优化，旨在解决长上下文建模中的计算瓶颈。\nNSA不仅能将大语言模型处理64k长文本的速度最高提升11.6倍，更在通用基准测试中实现了对传统...</p>\n\n<a href=\"https://wallstreetcn.com/articles/3741396\">Source Link</a>\n\n</div>\n","source":"wallstreetcn_hot_news","html":"<!DOCTYPE html>\n<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1.0,minimum-scale=1.0,maximum-scale=1.0,user-scalable=no\"/>\n<meta name=\"format-detection\" content=\"telephone=no,email=no,address=no\" />\n<title>中国大模型“卷技术”！DeepSeek前脚发布NSA，Kimi立刻跟进MoBA</title>\n<style type=\"text/css\">\na,abbr,acronym,address,applet,article,aside,audio,b,big,blockquote,body,canvas,caption,center,cite,code,dd,del,details,dfn,div,dl,dt,\nem,embed,fieldset,figcaption,figure,footer,form,h1,h2,h3,h4,h5,h6,header,hgroup,html,i,iframe,img,ins,kbd,label,legend,li,mark,menu,nav,\nobject,ol,output,p,pre,q,ruby,s,samp,section,small,span,strike,strong,sub,summary,sup,table,tbody,td,tfoot,th,thead,time,tr,tt,u,ul,var,video{ font:inherit;margin:0;padding:0;vertical-align:baseline;border:0 }\nbody{ font-size:16px; line-height:1.5; color:#999; background:transparent; }\n.wrapper{ overflow:hidden;word-break:break-all;padding:10px; }\nh1,h2{ font-weight:normal; line-height:1.35; margin-bottom:.6em; }\nh3,h4,h5,h6{ line-height:1.35; margin-bottom:1em; }\nh1{ font-size:24px; }\nh2{ font-size:20px; }\nh3{ font-size:18px; }\nh4{ font-size:16px; }\nh5{ font-size:14px; }\nh6{ font-size:12px; }\np,ul,ol,blockquote,dl,table{ margin:1.2em 0; }\nul,ol{ margin-left:2em; }\nul{ list-style:disc; }\nol{ list-style:decimal; }\nli,li p{ margin:10px 0;}\nimg{ max-width:100%;display:block;margin:0 auto 1em; }\nblockquote{ color:#B5B2B1; border-left:3px solid #aaa; padding:1em; }\nstrong,b{font-weight:bold;}\nem,i{font-style:italic;}\ntable{ width:100%;border-collapse:collapse;border-spacing:1px;margin:1em 0;font-size:.9em; }\nth,td{ padding:5px;text-align:left;border:1px solid #aaa; }\nth{ font-weight:bold;background:#5d5d5d; }\n.symbol-link{font-weight:bold;}\n/* header{ border-bottom:1px solid #494756; } */\n.title{ margin:0 0 8px;line-height:1.3;color:#ddd; }\n.meta {color:#5e5c6d;font-size:13px;margin:0 0 .5em; }\na{text-decoration:none; color:#2a4b87;}\n.meta .head { display: inline-block; overflow: hidden}\n.head .h-thumb { width: 30px; height: 30px; margin: 0; padding: 0; border-radius: 50%; float: left;}\n.head .h-content { margin: 0; padding: 0 0 0 9px; float: left;}\n.head .h-name {font-size: 13px; color: #eee; margin: 0;}\n.head .h-time {font-size: 11px; color: #7E829C; margin: 0;line-height: 11px;}\n.small {font-size: 12.5px; display: inline-block; transform: scale(0.9); -webkit-transform: scale(0.9); transform-origin: left; -webkit-transform-origin: left;}\n.smaller {font-size: 12.5px; display: inline-block; transform: scale(0.8); -webkit-transform: scale(0.8); transform-origin: left; -webkit-transform-origin: left;}\n.bt-text {font-size: 12px;margin: 1.5em 0 0 0}\n.bt-text p {margin: 0}\n</style>\n</head>\n<body>\n<div class=\"wrapper\">\n<header>\n<h2 class=\"title\">\n中国大模型“卷技术”！DeepSeek前脚发布NSA，Kimi立刻跟进MoBA\n</h2>\n\n<h4 class=\"meta\">\n\n\n2025-02-19 11:18 北京时间&nbsp;&nbsp;&nbsp;<a href=https://wallstreetcn.com/articles/3741396><strong>华尔街见闻</strong></a>\n\n\n</h4>\n\n</header>\n<article>\n<div>\n<p>周二，当全球目光聚焦于马斯克Grok-3的庞大GPU集群时，中国大模型公司正在技术创新的道路上默默加速。\n先是DeepSeek提出了原生稀疏注意力（Native Sparse Attention, NSA）机制。这项梁文锋亲自参与的研究成果，结合了算法创新和硬件优化，旨在解决长上下文建模中的计算瓶颈。\nNSA不仅能将大语言模型处理64k长文本的速度最高提升11.6倍，更在通用基准测试中实现了对传统...</p>\n\n<a href=\"https://wallstreetcn.com/articles/3741396\">Source Link</a>\n\n</div>\n\n\n</article>\n</div>\n</body>\n</html>\n","isBrief":false,"type":0,"news_type":1,"symbol":"NSA","symbol_name":"National Storage Affiliates Trust","start_time":0,"source_url":"https://wallstreetcn.com/articles/3741396","article_id":"2512413661","we_media_id":null,"thumbnails":["https://wpimg-wscn.awtmt.com/6a52a6b7-e356-436d-9939-d8f34ff949a7.png"],"rights":{"source":"wallstreetcn_hot_news","url":"https://wallstreetcn.com/articles/3741396","rn_cache_url":null,"directOrigin":true},"url":"https://stock-news.laohu8.com/highlight/detail?id=2512413661","pubTimestamp":1739935125,"columns":[],"sourceInfo":{"source_id":"wallstreetcn_hot_news","name":"华尔街见闻"},"weMediaInfo":null,"summary":"Kimi联合清华和浙大推出的稀疏注意力技术MoBA，在长文本处理任务中可以保持相近性能的同时，将注意力计算的时间和内存消耗显著降低。特别是在处理超长文本（如1000万token）时，MoBA的优势更加明显，可以实现16倍以上的加速。","collect":0,"end_time":0,"defaultTopTitle":"wallstreetcn.com","property":["earning"],"viewcount":null,"language":"zh","relate_stocks":{"NSA":"National Storage Affiliates Trust"},"translate_title":"China's large model \"volume technology\"! DeepSeek releases NSA, Kimi immediately follows up with MoBA","themeId":null,"isJumpTheme":false,"ttsUrl":null,"symbols_score_info":{"NSA":1},"content_text":"周二，当全球目光聚焦于马斯克Grok-3的庞大GPU集群时，中国大模型公司正在技术创新的道路上默默加速。\n先是DeepSeek提出了原生稀疏注意力（Native Sparse Attention, NSA）机制。这项梁文锋亲自参与的研究成果，结合了算法创新和硬件优化，旨在解决长上下文建模中的计算瓶颈。\nNSA不仅能将大语言模型处理64k长文本的速度最高提升11.6倍，更在通用基准测试中实现了对传统全注意力模型的性能反超。这一突破表明，通过算法和硬件层面的协同创新，可以在不牺牲模型性能的前提下，显著提升长文本处理效率。\n紧随DeepSeek的步伐，Kimi也迅速推出了自家的稀疏注意力技术——MoBA（Mixture of Block Attention）。\n据这份由月之暗面、清华大学和浙江大学的研究人员共同发布的技术报告《MOBA: MIXTURE OF BLOCK ATTENTION FOR LONG-CONTEXT LLMS》，MoBA的设计理念是将全上下文划分为多个块，每个查询令牌（query token）学习关注最相关的键值（KV）块，从而实现对长序列的高效处理。\n与DeepSeek创始人梁文锋参与著作一样，月之暗面创始人杨植麟的名字也出现这篇论文的作者栏里。\n\n据论文介绍，在各种长文本处理任务中，采用MoBA技术的模型可以在保持相近性能的同时，将注意力计算的时间和内存消耗显著降低。在1M token的测试中，MoBA比全注意力快了6.5倍，在处理超长文本（如1000万token）时，MoBA的优势更加明显，可以实现16倍以上的加速。\nMoBA已经部署于支持Kimi的长上下文请求处理，并在大语言模型的高效注意力计算方面取得了显著进展。更值得一提的是，MoBA可以轻松地集成到现有的 LLMs 中，而无需进行大量的训练。\nMoBA：基于块的稀疏注意力\n为了实现人工通用智能（AGI），LLMs需要能够处理长文本序列，这对于历史数据分析、复杂推理和决策等任务至关重要。\n而传统的自注意力机制计算复杂度呈二次增长，限制了LLMs处理长文本的能力。现有的解决方案要么引入了强偏见的结构（如滑动窗口注意力），要么对注意力机制进行了线性近似，这些方法在复杂推理任务中的表现尚未得到充分验证。\nMOBA技术的核心思想是将传统Transformer模型中的全局注意力机制改造为基于块的稀疏注意力。具体来说，MOBA将输入序列划分为多个块，然后对每个查询token动态选择最相关的几个块进行注意力计算，而不是像传统方法那样对所有token都进行计算。\n\n这种方法既保留了原始Transformer的强大表达能力，又显著降低了计算复杂度，特别适合处理超长文本输入。\nMoBA的核心创新点包括：\n\n可训练的块稀疏注意力： 全上下文被划分为多个块，每个查询令牌学习关注最相关的KV块，实现长序列的高效处理。\n无参数门控机制： 引入了一种新颖的无参数top-k门控机制，为每个查询令牌选择最相关的块，确保模型只关注信息量最大的部分。\n全注意力和稀疏注意力之间的无缝切换： MoBA被设计为全注意力的灵活替代品，允许在全注意力和稀疏注意力模式之间无缝切换。\n\n在处理超长文本时，MoBA可以实现16倍以上的加速\n在各种长文本处理任务中，采用MoBA技术的模型可以在保持相近性能的同时，将注意力计算的时间和内存消耗显著降低。在1M token的测试中，MoBA比全注意力快了6.5倍，在处理超长文本（如1000万token）时，MoBA的优势更加明显，可以实现16倍以上的加速。\nKimi 团队在多个方面对 MoBA 进行了实验验证：\n\n缩放定律实验（Scaling Law Experiments）： 实验表明，尽管 MoBA 的注意力模式稀疏度高达 81.25%，但其在语言模型损失方面的表现与全注意力相当。\n长文本缩放能力（Long Context Scalability）： 通过增加序列长度到 32K，MoBA 的稀疏度进一步提高到 95.31%。实验表明，MoBA 在处理长文本时，其性能与全注意力之间的差距逐渐缩小。\n细粒度块分割消融研究（Ablation Study on Fine-Grained Block Segmentation）： 实验表明，更细粒度的块分割可以进一步提高 MoBA 的性能。\n\nMoBA 与全注意力的混合训练（Hybrid of MoBA and Full Attention）： 实验表明，通过混合使用 MoBA 和全注意力进行训练，可以在训练效率和模型性能之间取得平衡。\n\n大型语言模型评估（Large Language Modeling Evaluation）： 在多个真实世界的下游任务中，MoBA 的表现与全注意力模型相当，甚至在某些任务上略有优势。\n\n效率和可扩展性（Efficiency and Scalability）： 实验表明，MoBA 在处理长序列时比全注意力更高效，计算复杂度为亚平方级。在1M token的测试中，MoBA比全注意力快了6.5倍，在处理 1000 万 token 的序列时，MoBA 的注意力计算时间减少了 16 倍。\n风险提示及免责条款\n\n            市场有风险，投资需谨慎。本文不构成个人投资建议，也未考虑到个别用户特殊的投资目标、财务状况或需要。用户应考虑本文中的任何意见、观点或结论是否符合其特定状况。据此投资，责任自负。","kind":"news","is_publish_news":true,"is_publish_highlight":false,"is_publish_live":false,"is_publish_wemedia":null,"editions":null,"column":"","sentiment":"0","news_tag":"viewpoints","news_rank":0,"symbols":[],"gpt_button":1,"need_auth":false,"code":"91000000","status":"200"}}}