10美元成功复现DeepSeek顿悟时刻,3B模型爆发超强推理!微软论文实锤涌现

华尔街见闻
22 Feb

复刻DeepSeek的神话,还在继续。 之前,UC伯克利的博士只用30美元,就复刻了DeepSeek中的顿悟时刻,震惊圈内。 这一次,来自荷兰阿姆斯特丹的研究人员Raz,再次打破纪录,把复刻成本降到了史上最低—— 只要10美元,就能复现DeepSeek顿悟时刻! Raz本人也表示,自己惊讶极了。 即使是一个非常简单的强化学习设置,并没有太多RL算法的复杂性(比如PPO、TRPO、GRPO等),也...

Source Link

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10