Agent 入门：从对话到自主行动

什么是 Agent

去掉各种玄学词汇，Agent 的本质非常简单：

Agent = LLM + 工具 + 一个循环

就这么回事。

伪代码：

async function agent(task: string) {
  const messages = [
    { role: 'system', content: '你是一个助手，用工具完成用户任务' },
    { role: 'user', content: task },
  ]

  while (true) {
    const res = await llm({ messages, tools })
    messages.push(res.message)

    if (!res.message.tool_calls) {
      // 模型认为任务完成了，输出最终答案
      return res.message.content
    }

    // 模型还想调工具，执行并把结果塞回去
    for (const call of res.message.tool_calls) {
      const result = await executeTools(call)
      messages.push({ role: 'tool', tool_call_id: call.id, content: result })
    }
  }
}

上一篇的 Function Calling 只跑一次，Agent 是在循环里反复跑。仅此而已。

一个能干活的 Agent

const tools = [
  {
    type: 'function',
    function: {
      name: 'read_file',
      description: '读取文件内容',
      parameters: {
        type: 'object',
        properties: { path: { type: 'string' } },
        required: ['path'],
      },
    },
  },
  {
    type: 'function',
    function: {
      name: 'write_file',
      description: '写入文件内容',
      parameters: {
        type: 'object',
        properties: {
          path: { type: 'string' },
          content: { type: 'string' },
        },
        required: ['path', 'content'],
      },
    },
  },
  {
    type: 'function',
    function: {
      name: 'list_files',
      description: '列出目录下的文件',
      parameters: {
        type: 'object',
        properties: { dir: { type: 'string' } },
        required: ['dir'],
      },
    },
  },
]

// 给它任务："把 src 目录下所有 .js 改成 .ts 并修正引入路径"
await agent('把 src 目录下所有 .js 文件重命名为 .ts 并更新 import 路径')

模型会自己分解：列目录 → 读文件 → 修改内容 → 写回去 → 循环。

这就是Cursor、Claude Code、GitHub Copilot Agent的底层原理。

Agent 的几种架构模式

1. ReAct（Reasoning + Acting）

最经典的范式。每一步模型要输出两段：

Thought: 我需要先知道目录下有什么文件
Action: list_files(dir="src")
Observation: [a.js, b.js, c.js]
Thought: 现在我要读 a.js 的内容
Action: read_file(path="src/a.js")
...
Thought: 所有文件都改完了，任务完成
Final Answer: 已将 3 个 .js 文件转换为 .ts

优点：思路可见、易调试。现代 Function Calling 本质上就是 ReAct 的简化版。

2. Plan-and-Execute

先让模型生成完整计划，再逐步执行。适合复杂多步任务。

Plan:
  1. 列出所有 .js
  2. 对每个文件：
     a. 读内容
     b. 分析引入
     c. 重写
  3. 删除旧文件

Execute step 1 → step 2a → step 2b → ...

3. Reflexion（反思）

每轮执行后让模型评价自己做得怎么样，错了就回退、改进、重试。

4. 多 Agent 协作

几个专职 Agent 分工：Planner、Coder、Reviewer、Tester……互相调用。代表作：AutoGen、CrewAI。

真实产品里的 Agent

Cursor / Claude Code / Windsurf— 编程 Agent

工具：读写文件、运行 shell、跑测试、grep、git

Perplexity / Kimi 探索版— 研究 Agent

工具：搜索、打开网页、总结、交叉验证

Devin— 软件工程师 Agent

工具：完整的开发环境（浏览器、VS Code、终端）

Operator / Computer Use— 电脑操作 Agent

工具：截屏、点击、键盘输入

Manus— 通用任务 Agent

共通点：工具越丰富、越贴近人类工作流的 Agent 就越强。

实现自己 Agent 时要注意的坑

1. 死循环

模型可能反复调同一个工具。加保护：

const MAX_STEPS = 20
for (let step = 0; step < MAX_STEPS; step++) {
  // 主循环
}
throw new Error('Agent 超过最大步数')

2. 上下文爆炸

每一步都往 messages 里塞东西，很快就会到 Token 上限。

解决方案：

定期摘要旧对话
只保留最近 N 轮的工具结果
大文件只传摘要，原始内容用 ID 引用
Prompt Caching 降成本

3. 工具调用失败

网络错误、参数错误、权限不够。一定要把错误信息也返回给模型，让它有机会纠错。

try {
  result = await executeTool(call)
} catch (e) {
  result = { error: e.message } // 不要直接 throw
}
messages.push({ role: 'tool', content: JSON.stringify(result) })

4. 成本失控

每一步都在烧 Token。生产环境必须：

限制每个用户/任务的预算
监控每一步的 Token 消耗
小任务用小模型，关键决策才上旗舰

5. 安全性

Agent 能操作真实世界，意味着它能造真实破坏。

写操作的工具要有审批门（人在环中 human-in-the-loop）
生产系统的工具用只读账号或沙箱
别让 Agent 直接执行 shell / eval 除非你知道自己在干什么

前端能做什么

Agent 的前端应用非常多：

-浏览器内的 AI 助手：LangChain.js / Vercel AI SDK 跑在浏览器端做用户任务 -Figma/Notion/Chrome 插件：用插件 API 作为工具 -Agent 的 UI：Agent 的每步思考、工具调用都需要好看地展示 -低代码/无代码平台：用户描述需求，Agent 生成 UI

框架选择建议

学习阶段：手写，不要用框架，只调原生 SDK。

做产品：

前端 / 全栈 → Vercel AI SDK
生态最全 → LangChain.js
多 Agent / 复杂流程 → LangGraph 或 Mastra
要轻量 → AI SDK + 自己写循环

动手作业

写一个"研究 Agent"：

工具：search(query)（用 Tavily / Serper API）、fetch_page(url)、summarize(text)
任务：用户给一个话题，Agent 自己搜索、读页面、汇总，输出带引用的研究报告

进阶：加一个 ask_user(question) 工具，让 Agent 不确定时能问用户。

参考资料

Anthropic: Building Effective Agents — 2024 年最重要的 Agent 设计文章之一
ReAct 论文
LangChain: Agent Types
Vercel AI SDK: Agents Guide
OpenAI Agents SDK — 官方的极简 Agent 框架

（采用 CC BY-NC-SA 4.0 许可协议进行授权）

本文标题:Agent 入门：从对话到自主行动

本文链接:https://www.sshipanoo.com/blog/ai/ai-for-frontend/07-Agent入门/

本文最后一次更新为天前，文章中的某些内容可能已过时！