周二,当全球目光聚焦于马斯克Grok-3的庞大GPU集群时,中国大模型公司正在技术创新的道路上默默加速。 先是DeepSeek提出了原生稀疏注意力(Native Sparse Attention, NSA)机制。这项梁文锋亲自参与的研究成果,结合了算法创新和硬件优化,旨在解决长上下文建模中的计算瓶颈。 NSA不仅能将大语言模型处理64k长文本的速度最高提升11.6倍,更在通用基准测试中实现了对传统...
Source LinkDisclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.