Resource Info Paper https://arxiv.org/abs/2409.12183 Code & Data https://github.com/Zayne-sprague/To-CoT-or-not-to-CoT Public ICLR Date 2025.06.10
探讨了 CoT 是否会在各类任务上都提升模型性能,但是发现 CoT 仅在数学和逻辑相关的任务上提升大,其他类型的任务收益很小。
Findings:
Where does zero-shot CoT improve over direct prompts? On datasets that require math (MATH, GSM8K) or formal logic (ContextHub, MuSR to a lesser degree) to answer the problem.
Does the answer format impact where CoT will help? Not much. Free response capabilities required for BigGen Bench may not benefit from pre-planning.
Are the gains in Knowledge, Soft Reasoning, and Commonsense significant? Mostly no, except for MMLU, StrategyQA, and MuSR.
本文作者:Geaming
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!