Stanford CS329A | Self-Improving AI Agents

#	Date	Description	Paper Readings^*	Deadlines
1	Mon Jan 6	Course Overview ▶️
2	Fri Jan 10	Test-time Compute Scaling ▶️	Large Language Monkeys: Scaling Inference Compute with Repeated Sampling (Brown et al. 2024) Archon: An Architecture Search Framework for Inference-Time Techniques (Saad-Falcon et al. 2024) Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Snell et al. 2024)
3	Mon Jan 13	Self-Improvement Techniques with Verifiers▶️	Training Verifiers to Solve Math Word Problems (Cobbe et al. 2021) Let's Verify step by step (Lightman et al. 2023) Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations (Wang et al. 2023)
4	Fri Jan 17	Self-Improvement Techniques with RL	Constitutional AI: Harmlessness from AI Feedback (Bai et al. 2022) STaR: Bootstrapping Reasoning With Reasoning (Zelikman et al. 2022) Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (Singh et al. 2024)
	Mon Jan 20	MLK Day - No classes
5	Fri Jan 24	Self-Improvement Techniques with Search	Thinking Fast and Slow with Deep Learning and Tree Search (Anthony et al. 2017) Competition-level code generation with AlphaCode (Li et al. 2022) AlphaCode 2 Technical Report (2023)	Project proposal due @10pm Homework 1 released
6	Mon Jan 27	Open-ended Agent Learning in the Era of Foundation Models (Guest Lecture: Prof. Jeff Clune, UBC/Google DeepMind)	The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (Lu et al. 2024) Automated Design of Agentic Systems (Hu et al. 2024)
7	Fri Jan 31	Augmenting LLMs with Tool use/Actions	ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al. 2022) Toolformer: Language Models Can Teach Themselves to Use Tools (Schick et al. 2023) RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
8	Mon Feb 3	Planning and Multi-Step Reasoning	Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models (Zhou et al. 2023) LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench (Valmeekam et al. 2024) ADaPT: As-Needed Decomposition and Planning with Language Models (Prasad et al. 2024)	Homework 1 due February 4, @10pm Homework 2 released on February 5
9	Fri Feb 7	Reasoning across Modalities (incl Invited talk on Gemini Multimodal)	Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents Developing a computer use model The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
10	Mon Feb 10	Benchmarks & Challenges in Evaluating Agents	SWE-bench: Can Language Models Resolve Real-World GitHub Issues? KernelBench: Can LLMs Write GPU Kernels? RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains GAIA: A Benchmark for General AI Assistants
11	Fri Feb 14	AI Coding Agents (Guest Lecture: Michele Catasta, Replit)	SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering (Yang et al. 2024) SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement (Antoniades et al. 2024) SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?	Homework 2 due on Feb 18 @11:59pm
	Mon Feb 17	President's Day - No classes
12	Fri Feb 21	Midterm Progress Presentations
13	Mon Feb 24	Midterm Progress Presentations		Midterm Progress Presentation submission to Gradescope
14	Fri Feb 28	Agent Orchestration Frameworks (Guest Lecture: Chi Wang, Autogen)	AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (Wu et al. 2023)
15	Mon Mar 3	Augmenting LLMs with Retrieval/Memory	Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? (Lee et al. 2024) Contextual Retrieval (Anthropic, 2024) MemGPT: Towards LLMs as Operating Systems (Packer et al, 2023)
16	Fri Mar 7	Guest Lecture Lukasz Kaiser (OpenAI)
17	Mon Mar 10	Multimodal AI Agents (Guest Lecture: Prof. Ruslan Salakhutdinov, CMU/Meta)
18	Fri Mar 14	Multi-agent Systems & Future Research Areas	Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains Mixture-of-Agents Enhances Large Language Model Capabilities
19	Wed Mar 19	Final Project Poster Presentation		Project Final Report due Mar 21@10pm

*Paper readings may be updated closer to the class date.

Course Overview

Course Staff

Logistics

Schedule

Grading

Student Lectures

Weekly Discussion Questions

Homework Assignments

Research Projects

Course Policies

Late Policy

Audit Policy

Communication with Course Staff

Industry Sponsorship