Resource Info Paper https://arxiv.org/abs/2310.12931 Code & Data https://github.com/eureka-research/Eureka Public ICLR Date 2025.06.30
Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks. However, harnessing them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem. We bridge this fundamental gap and present EUREKA, a human-level reward design algorithm powered by LLMs.
Evolution-driven Universal REward Kit for Agent (EUREKA), a novel reward design algorithm powered by coding LLMs with the following contributions:
Problem Setting and Definitions
The goal of reward design is to return a shaped reward function for a ground-truth reward function that may be difficult to optimize directly (e.g., sparse rewards); this ground-truth reward function may only be accessed via queries by the designer.
Definition 2.1. (Reward Design Problem (Singh et al., 2010)) A reward design problem (RDP) is a tuple where is the world model with state space , action space , and transition function . is the space of reward functions. is a learning algorithm that outputs a policy that optimizes reward in the resulting Markov Decision Process (MDP), . is the fitness function that produces a scalar evaluation of any policy, which may only be accessed via policy queries (i.e., evaluate the policy using the ground truth reward function).
In an RDP, the goal is to output a reward function such that the policy that optimizes achieves the highest fitness score .
Reward Generation Problem. In our problem setting, every component within a RDP is specified via code. Then, given a string that specifies the task, the objective of the reward generation problem is to output a reward function code such that is maximized.
EUREKA outperforms human rewards.
EUREKA consistently improves over time.
论文的创新之处与独特性:
论文中存在的问题及改进建议:
基于论文的内容和研究结果,提出的创新点或研究路径:
为新的研究路径制定的研究方案:
研究方案1:基于视觉-语言模型的任务描述与奖励生成
研究方案2:多任务奖励生成机制
研究方案3:扩展至真实机器人环境
本文作者:Geaming
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!