编辑
2025-06-10
论文阅读
0

目录

Summary Overview
Main Content
🤖
Others
ResourceInfo
Paperhttps://arxiv.org/abs/2409.12183
Code & Datahttps://github.com/Zayne-sprague/To-CoT-or-not-to-CoT
PublicICLR
Date2025.06.10

Summary Overview

探讨了 CoT 是否会在各类任务上都提升模型性能,但是发现 CoT 仅在数学和逻辑相关的任务上提升大,其他类型的任务收益很小。

image.png

Main Content

Findings:

  1. CoT only helps substantially on problems requiring mathematical, logical, or algorithm reasoning.
  2. CoT primarily helps with the execution step that performs computation and symbolic manipulation, but falls short of what LLMs with tool augmentation can do.

image.png

image.png

Where does zero-shot CoT improve over direct prompts? On datasets that require math (MATH, GSM8K) or formal logic (ContextHub, MuSR to a lesser degree) to answer the problem.

Does the answer format impact where CoT will help? Not much. Free response capabilities required for BigGen Bench may not benefit from pre-planning.

Are the gains in Knowledge, Soft Reasoning, and Commonsense significant? Mostly no, except for MMLU, StrategyQA, and MuSR.

🤖

Others

本文作者:Geaming

本文链接:

版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!