快手-W早盘涨超3% 发布全新大模型训练方法SRPO并宣布开源

新浪港股

Apr 25, 2025

热点栏目自选股数据中心行情中心资金流向模拟交易

客户端

　　快手-W（01024）早盘上涨3.47%，现报52.20港元，成交额7.68亿港元。

　　4月23日，快手Kwaipilot团队发布全新大模型训练方法SRPO并宣布开源。该方法仅用 GRPO 1/10的训练成本，在数学与代码双领域基准测试中实现性能突破：AIME2024 得分50，LiveCodeBench 得分41.6，成为业界首个在两大专业领域同时复现DeepSeek-R1-Zero 的方法。

　　快手 Kwaipilot 团队在最新研究成果《SRPO： A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM》中提出了一种创新的强化学习框架 —— 两阶段历史重采样策略优化（two-Staged history-Resampling Policy Optimization ，SRPO），这是业界首个同时在数学和代码两个领域复现 DeepSeek-R1-Zero 性能的方法。

　　通过使用与 DeepSeek 相同的基础模型（Qwen2.5-32B）和纯粹的强化学习训练，SRPO成功在AIME24和LiveCodeBench基准测试中取得了优异成绩（AIME24 = 50、LiveCodeBench = 41.6），超越了DeepSeek-R1-Zero-32B 的表现。更值得注意的是，SRPO 仅需 R1-Zero 十分之一的训练步数就达到了这一水平。

海量资讯、精准解读，尽在新浪财经APP

责任编辑：卢昱君

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

1
2
3
4
5
6
7
8
9
10

{"basename":"","ssrTDKData":{"titleTemplate":"%s - Tiger Brokers","title":"Tiger Brokers | Global Stocks, Options & Futures Trading App","description":"Tiger Brokers, one-stop investment in US stocks, SGX stocks, HK stocks, A-shares & other global assets. One of the best stock trading platforms in Singapore.","keywords":"tiger brokers,tiger trade,tiger brokers singapore,broker online,stock trading in singapore,share trading singapore,brokerage firm singapore,trading app,stock broker singapore,stock trading platforms,trading account","social":{"ogDescription":"Tiger Brokers, one-stop investment in US stocks, SGX stocks, HK stocks, A-shares & other global assets. One of the best stock trading platforms in Singapore.","ogImage":"https://c1.itigergrowtha.com/portal5/static/media/og-logo.be62fbe1.png","ogUrl":"https://www.itiger.com/news/2530949224"},"companyName":"Tiger Brokers"},"pageData":{"isMobile":false,"isTiger":false,"isTTM":true,"region":"SGP","license":"TBSG","edition":"fundamental"},"isCrawlerRequest":true,"__swrFallback__":{"@#url:\"https://stock-news.skytigris.cn/v3/news\",params:#id:\"2530949224\",edition:\"fundamental\",auth_exemption:1,,,undefined,":{"share":"https://ttm.financial/m/news/2530949224?lang=en_US&edition=fundamental","thumbnail":"","is_english":false,"pubTime":"2025-04-25 11:13","share_image_url":"https://static.laohu8.com/b0d1b7e8843deea78cc308b15114de44","id":"2530949224","market":"hk","top_or_hot":-1,"title":"快手-W早盘涨超3% 发布全新大模型训练方法SRPO并宣布开源","media":"新浪港股","content":"<html><body><div>\n<div>\n<div>\n<img src=\"http://image.sinajs.cn/n/hk/min/640x360xxfhd/01024.png\"/>\n</div>\n<div>\n<div>\n<span>热点栏目</span>\n<s></s>\n自选股\n数据中心\n行情中心\n资金流向\n模拟交易\n</div>\n客户端\n</div>\n</div>\n<p>　　<span><a href=\"https://laohu8.com/S/01024\">快手-W</a></span><span></span>（01024）早盘上涨3.47%，现报52.20港元，成交额7.68亿港元。</p>\n<p>　　4月23日，快手Kwaipilot团队发布全新大模型训练方法SRPO并宣布开源。该方法仅用 GRPO 1/10的训练成本，在数学与代码双领域基准测试中实现性能突破：AIME2024 得分50，LiveCodeBench 得分41.6，成为业界首个在两大专业领域同时复现DeepSeek-R1-Zero 的方法。</p>\n<p>　　快手 Kwaipilot 团队在最新研究成果《SRPO： A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM》中提出了一种创新的强化学习框架 —— 两阶段历史重采样策略优化（two-Staged history-Resampling Policy Optimization ，SRPO），这是业界首个同时在数学和代码两个领域复现 DeepSeek-R1-Zero 性能的方法。</p>\n<p>　　通过使用与 DeepSeek 相同的基础模型（Qwen2.5-32B）和纯粹的强化学习训练，SRPO成功在AIME24和LiveCodeBench基准测试中取得了优异成绩（AIME24 = 50、LiveCodeBench = 41.6），超越了DeepSeek-R1-Zero-32B 的表现。更值得注意的是，SRPO 仅需 R1-Zero 十分之一的训练步数就达到了这一水平。</p>\n<div>\n<div><img src=\"\"/></div>\n<div>海量资讯、精准解读，尽在新浪财经APP</div>\n</div>\n<p>责任编辑：卢昱君 </p>\n</div></body></html>","source":"sina","html":"<!DOCTYPE html>\n<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1.0,minimum-scale=1.0,maximum-scale=1.0,user-scalable=no\"/>\n<meta name=\"format-detection\" content=\"telephone=no,email=no,address=no\" />\n<title>快手-W早盘涨超3% 发布全新大模型训练方法SRPO并宣布开源</title>\n<style type=\"text/css\">\na,abbr,acronym,address,applet,article,aside,audio,b,big,blockquote,body,canvas,caption,center,cite,code,dd,del,details,dfn,div,dl,dt,\nem,embed,fieldset,figcaption,figure,footer,form,h1,h2,h3,h4,h5,h6,header,hgroup,html,i,iframe,img,ins,kbd,label,legend,li,mark,menu,nav,\nobject,ol,output,p,pre,q,ruby,s,samp,section,small,span,strike,strong,sub,summary,sup,table,tbody,td,tfoot,th,thead,time,tr,tt,u,ul,var,video{ font:inherit;margin:0;padding:0;vertical-align:baseline;border:0 }\nbody{ font-size:16px; line-height:1.5; color:#999; background:transparent; }\n.wrapper{ overflow:hidden;word-break:break-all;padding:10px; }\nh1,h2{ font-weight:normal; line-height:1.35; margin-bottom:.6em; }\nh3,h4,h5,h6{ line-height:1.35; margin-bottom:1em; }\nh1{ font-size:24px; }\nh2{ font-size:20px; }\nh3{ font-size:18px; }\nh4{ font-size:16px; }\nh5{ font-size:14px; }\nh6{ font-size:12px; }\np,ul,ol,blockquote,dl,table{ margin:1.2em 0; }\nul,ol{ margin-left:2em; }\nul{ list-style:disc; }\nol{ list-style:decimal; }\nli,li p{ margin:10px 0;}\nimg{ max-width:100%;display:block;margin:0 auto 1em; }\nblockquote{ color:#B5B2B1; border-left:3px solid #aaa; padding:1em; }\nstrong,b{font-weight:bold;}\nem,i{font-style:italic;}\ntable{ width:100%;border-collapse:collapse;border-spacing:1px;margin:1em 0;font-size:.9em; }\nth,td{ padding:5px;text-align:left;border:1px solid #aaa; }\nth{ font-weight:bold;background:#5d5d5d; }\n.symbol-link{font-weight:bold;}\n/* header{ border-bottom:1px solid #494756; } */\n.title{ margin:0 0 8px;line-height:1.3;color:#ddd; }\n.meta {color:#5e5c6d;font-size:13px;margin:0 0 .5em; }\na{text-decoration:none; color:#2a4b87;}\n.meta .head { display: inline-block; overflow: hidden}\n.head .h-thumb { width: 30px; height: 30px; margin: 0; padding: 0; border-radius: 50%; float: left;}\n.head .h-content { margin: 0; padding: 0 0 0 9px; float: left;}\n.head .h-name {font-size: 13px; color: #eee; margin: 0;}\n.head .h-time {font-size: 11px; color: #7E829C; margin: 0;line-height: 11px;}\n.small {font-size: 12.5px; display: inline-block; transform: scale(0.9); -webkit-transform: scale(0.9); transform-origin: left; -webkit-transform-origin: left;}\n.smaller {font-size: 12.5px; display: inline-block; transform: scale(0.8); -webkit-transform: scale(0.8); transform-origin: left; -webkit-transform-origin: left;}\n.bt-text {font-size: 12px;margin: 1.5em 0 0 0}\n.bt-text p {margin: 0}\n</style>\n</head>\n<body>\n<div class=\"wrapper\">\n<header>\n<h2 class=\"title\">\n快手-W早盘涨超3% 发布全新大模型训练方法SRPO并宣布开源\n</h2>\n\n<h4 class=\"meta\">\n\n\n2025-04-25 11:13 北京时间&nbsp;&nbsp;&nbsp;<a href=https://finance.sina.com.cn/stock/hkstock/marketalerts/2025-04-25/doc-ineuivrz2793822.shtml><strong>新浪港股</strong></a>\n\n\n</h4>\n\n</header>\n<article>\n<div>\n<p>热点栏目\n\n自选股\n数据中心\n行情中心\n资金流向\n模拟交易\n\n客户端\n\n\n　　快手-W（01024）早盘上涨3.47%，现报52.20港元，成交额7.68亿港元。\n　　4月23日，快手Kwaipilot团队发布全新大模型训练方法SRPO并宣布开源。该方法仅用 GRPO 1/10的训练成本，在数学与代码双领域基准测试中实现性能突破：AIME2024 得分50，LiveCodeBench 得分41.6...</p>\n\n<a href=\"https://finance.sina.com.cn/stock/hkstock/marketalerts/2025-04-25/doc-ineuivrz2793822.shtml\">Source Link</a>\n\n</div>\n\n\n</article>\n</div>\n</body>\n</html>\n","isBrief":false,"type":0,"news_type":1,"symbol":"LU2097828474.EUR","symbol_name":"AZ EQUITY CHINA \"A\" (EUR) ACC A","start_time":0,"source_url":"https://finance.sina.com.cn/stock/hkstock/marketalerts/2025-04-25/doc-ineuivrz2793822.shtml","article_id":"2530949224","we_media_id":null,"thumbnails":[],"rights":null,"url":"https://stock-news.laohu8.com/highlight/detail?id=2530949224","pubTimestamp":1745550780,"columns":[],"sourceInfo":{"source_id":"sina","name":"sina"},"weMediaInfo":null,"summary":"　　快手-W早盘上涨3.47%，现报52.20港元，成交额7.68亿港元。　　4月23日，快手Kwaipilot团队发布全新大模型训练方法SRPO并宣布开源。　　通过使用与 DeepSeek 相同的基础模型和纯粹的强化学习训练，SRPO成功在AIME24和LiveCodeBench基准测试中取得了优异成绩，超越了DeepSeek-R1-Zero-32B 的表现。更值得注意的是，SRPO 仅需 R1-Zero 十分之一的训练步数就达到了这一水平。","collect":0,"end_time":0,"defaultTopTitle":"sina.com.cn","property":[],"viewcount":null,"language":"zh","relate_stocks":{"81024":"快手-WR","LU2097828474.EUR":"AZ EQUITY CHINA \"A\" (EUR) ACC A","LU0348767384.USD":"ALLIANZLITTLE DRAGONS \"A\" (USD) ACC","LU2097828805.USD":"AZ EQUITY CHINA \"A-AZ\" (USD) ACC","LU1720050803.USD":"安联全方位中国股票基金","LU0348766576.USD":"ALLIANZ LITTLE DRAGONS \"A\" (USD) INC","BK1610":"ETF&股票定投概念","LU2097828714.EUR":"AZ EQUITY CHINA \"BAZ\" (EUR) ACC","BK1608":"元宇宙概念","BK1095":"互动媒体与服务","BK1615":"港股-互联网","BK1575":"同股不同权","LU2097828631.EUR":"AZ EQUITY CHINA \"A\" (EUR) ACC","LU1794554557.SGD":"Allianz All China Equity AT Acc H2-SGD","LU2097828557.USD":"AZ EQUITY CHINA \"A\" (USD) ACC","LU1770034418.SGD":"ALL CHINA EQUITY \"A\" (SGDHDG) ACC","LU1303224171.USD":"NINETY ONE GSF ALL CHINA EQUITY \"A\" (USD) INC","BK1591":"就地过年概念","LU1719994722.HKD":"NINETY ONE GSF ALL CHINA EQUITY \"A\" (HKD) ACC","BK1590":"短视频概念股","LU1251922891.USD":"NINETY ONE GSF ALL CHINA EQUITY \"A\" (USD) ACC","01024":"快手-W"},"translate_title":"Kuaishou-W rose more than 3% in early trading, released a new large model training method SRPO and announced open source","themeId":null,"isJumpTheme":false,"ttsUrl":null,"symbols_score_info":{"81024":0.6,"01024":1},"content_text":"热点栏目\n\n自选股\n数据中心\n行情中心\n资金流向\n模拟交易\n\n客户端\n\n\n　　快手-W（01024）早盘上涨3.47%，现报52.20港元，成交额7.68亿港元。\n　　4月23日，快手Kwaipilot团队发布全新大模型训练方法SRPO并宣布开源。该方法仅用 GRPO 1/10的训练成本，在数学与代码双领域基准测试中实现性能突破：AIME2024 得分50，LiveCodeBench 得分41.6，成为业界首个在两大专业领域同时复现DeepSeek-R1-Zero 的方法。\n　　快手 Kwaipilot 团队在最新研究成果《SRPO： A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM》中提出了一种创新的强化学习框架 —— 两阶段历史重采样策略优化（two-Staged history-Resampling Policy Optimization ，SRPO），这是业界首个同时在数学和代码两个领域复现 DeepSeek-R1-Zero 性能的方法。\n　　通过使用与 DeepSeek 相同的基础模型（Qwen2.5-32B）和纯粹的强化学习训练，SRPO成功在AIME24和LiveCodeBench基准测试中取得了优异成绩（AIME24 = 50、LiveCodeBench = 41.6），超越了DeepSeek-R1-Zero-32B 的表现。更值得注意的是，SRPO 仅需 R1-Zero 十分之一的训练步数就达到了这一水平。\n\n\n海量资讯、精准解读，尽在新浪财经APP\n\n责任编辑：卢昱君","kind":"news","is_publish_news":true,"is_publish_highlight":false,"is_publish_live":false,"is_publish_wemedia":null,"editions":null,"column":"","sentiment":"1","news_tag":"productRelease","news_rank":0,"symbols":[],"gpt_button":0,"need_auth":false,"code":"91000000","status":"200"}}}