其他资讯

BoxAgnts Agent多轮对话与工具技能调用深度评测

2026-05-31

阅读 0

热度 0

作者菜鸟AI编辑部

摘要

如果你只和 ChatGPT 聊过天，你可能会觉得 AI Agent 就是 "把 prompt 发给 API，把回复显示出来 "。

如果你只和 ChatGPT 聊过天，你可能会觉得 AI Agent 就是"把 prompt 发给 API，把回复显示出来"。

BoxAgnts介绍（6）——Agent多轮对话及Tool、Skill调用

真实情况要复杂得多。下面是 BoxAgnts 中一个完整的 Agent 交互流程：


用户输入："帮我读一下 config.toml，把 port 改成 9090"  
1. 用户消息加入对话历史
2. 构建 system prompt（工具列表 + 技能列表 + AGENTS.md + Agent 角色定义）
3. 调用 LLM API → 流式接收响应
4. AI 决定调用工具：tool_use("read", {path: "config.toml"})
5. 执行 read 工具（WASM 沙箱内）
6. 工具结果注入对话历史
7. 再次调用 API → AI 分析配置
8. AI 决定调用工具：tool_use("edit", {path: "config.toml", old: "port = 8080", new: "port = 9090"})
9. 执行 edit 工具
10. 工具结果注入对话
11. 再次调用 API → AI 回复："已将端口从 8080 改为 9090"
12. end_turn → 对话结束

这个过程涉及 3 次 API 调用、2 次工具执行、流式推送、上下文管理。本文拆解每个环节的设计和实现。

Agent 定义：给 Agent 一个"身份"

在开始推理循环之前，需要先定义 Agent 的"角色"。BoxAgnts 内置了三个预置 Agent：


// boxagnts-workspace/src/config.rs
pub struct AgentDefinition {
    pub description: Option,    // 描述
    pub model: Option,          // 模型覆盖
    pub temperature: Option,       // 温度覆盖
    pub prompt: Option,         // 系统提示前缀
    pub access: String,                 // 权限：full / read-only / search-only
    pub visible: bool,                  // 是否在 @agent 自动补全中显示
    pub max_turns: Option,         // 最大轮次覆盖
    pub color: Option,          // 终端显示颜色
}

预置的三个 Agent 角色：

Agent	权限	prompt 特征	适用场景
build	full	"You are the build agent. Focus on implementing..."	编码、修改文件
plan	read-only	"You are the plan agent. You can read files and analyze..."	代码分析、架构设计
explore	search-only	"Fast search-only agent for code exploration"	快速搜索、文件定位

Agent prompt 如何注入

Agent 定义中的 prompt 字段会在查询循环启动时被注入到 system prompt 的最前面：


// boxagnts-query/src/query.rs
if let Some(ref agent) = config.agent_definition {
    if let Some(ref agent_prompt) = agent.prompt {
        patched.system_prompt = Some(match &config.system_prompt {
            Some(existing) => format!("{}\n\n{}", agent_prompt, existing),
            None => agent_prompt.clone(),
        });
    }
}

同时，Agent 可以覆盖模型和最大轮次：


let effective_model = if let Some(ref agent) = config.agent_definition {
    agent.model.clone().unwrap_or_else(|| config.model.clone())
} else {
    config.model.clone()
};
let effective_max_turns = config.agent_definition
    .as_ref()
    .and_then(|a| a.max_turns)
    .unwrap_or(config.max_turns);

这意味着用户可以通过 Agent 定义实现"同一个会话中不同阶段使用不同模型和角色"——比如规划阶段用 read-only 的慢思考模型，执行阶段用 full-access 的快速模型。

run_query_loop：Agent 的心脏

run_query_loop() 是 BoxAgnts 中最核心的函数，位于 boxagnts-query crate 中：


pub async fn run_query_loop(
    client: &AnthropicClient,           // API 客户端
    messages: &mut Vec,       // 对话历史（可变引用）
    tools: &[Box],            // 工具集合
    tool_ctx: &ToolContext,             // 工具执行上下文
    config: &QueryConfig,               // 循环配置
    cost_tracker: Arc,     // 成本追踪
    event_tx: Option>, // 事件推送
    cancel_token: CancellationToken,    // 取消信号
    pending_messages: Option<&mut Vec>, // 待处理消息队列
) -> QueryOutcome

这个函数签名本身就是一篇架构文档。每个参数都是一个设计决策：

参数	设计意图
`client`	单一入口，但内部通过 ProviderRegistry 可切换 20+ 模型
`messages: &mut Vec`	直接修改对话历史，每次迭代追加内容
`tools: &[Box]`	类型擦除的工具集合，AI 通过名称调用
`tool_ctx`	携带 work_dir、allowed_hosts 等沙箱配置
`event_tx`	实时推送每轮状态给 Dashboard / TUI
`cancel_token`	用户可随时中断循环
`pending_messages`	执行中插入命令（如用户在工具执行时发送新消息）

主循环的五步节拍


┌─────────────────────────────────────────────┐
│                     loop {                   │
│                                                │
│ ① 检查终止条件                                │
│    · turn > max_turns ? → EndTurn            │
│    · cancel_token ?     → Cancelled           │
│    · budget exceeded?   → BudgetExceeded      │
│                                                │
│ ② 预处理消息                                  │
│    · drain pending_messages queue             │
│    · apply_tool_result_budget (截断旧结果)    │
│    · auto_compact (上下文压缩)                │
│                                                │
│ ③ 构建 system prompt + 调用 LLM API          │
│    · 注入 Agent 定义 / AGENTS.md              │
│    · 构建 CreateMessageRequest                │
│    · 流式接收 StreamEvent                     │
│    · 累积 text / thinking / tool_use blocks   │
│                                                │
│ ④ 处理响应                                    │
│    · end_turn → 返回                          │
│    · tool_use → 并行执行工具 → 结果注入 → 继续│
│    · max_tokens → 恢复对话 → 继续             │
│                                                │
│ ⑤ 错误恢复                                    │
│    · overloaded → switch fallback model        │
│    · stream stall → retry (最多 2 次)          │
│                                                │
│ }                                              │
└─────────────────────────────────────────────┘

System Prompt 构建：Agent 的"世界观"

在每一轮 API 调用前，BoxAgnts 都会构建完整的 system prompt：


fn build_system_prompt(config: &QueryConfig) -> SystemPrompt {
    let opts = SystemPromptOptions {
        custom_system_prompt: config.system_prompt.clone(),    // 用户自定义
        append_system_prompt: config.append_system_prompt.clone(), // 追加内容
        output_style: config.output_style,                     // 输出风格
        custom_output_style_prompt: config.output_style_prompt.clone(),
        working_directory: config.working_directory.clone(),   // 当前工作目录
        ..Default::default()
    };
    let text = boxagnts_core::system_prompt::build_system_prompt(&opts);
    SystemPrompt::Text(text)
}

System prompt 的结构是有层次的：


┌──────────────────────────────────────┐
│ Agent 角色定义 (build/plan/explore)  │ ← AgentDefinition.prompt
├──────────────────────────────────────┤
│ 核心能力声明                          │
│ · 可用工具列表 (16+ 个)               │ ← 由 tools 参数动态生成
│ · 技能列表                            │ ← 由 SkillTool 发现
│ · 输出格式要求                        │
│ · 安全边界                            │
├──────────────────────────────────────┤
│ AGENTS.md 内容                        │ ← 用户项目级行为规范
├──────────────────────────────────────┤
│ 动态边界标记                          │
│ --- 以上缓存，以下不缓存 ---          │
├──────────────────────────────────────┤
│ 会话特定信息                          │ ← 当前工作目录、时间等
└──────────────────────────────────────┘

--- 以上缓存，以下不缓存 --- 这个分割线是一个聪明的设计——Anthropic API 支持 prompt caching，缓存以上部分可以显著降低每次 API 调用的 token 成本。

max_tokens 恢复：Agent 的"断点续传"

当 AI 的回复达到 max_tokens 限制时，模型会中途切断输出。普通 API 调用到这里就结束了——但 Agent 不能停。

BoxAgnts 的解法很巧妙：


// boxagnts-query/src/query.rs
const MAX_TOKENS_RECOVERY_LIMIT: u32 = 3;

const MAX_TOKENS_RECOVERY_MSG: &str =
    "Output token limit hit. Resume directly — no apology, no recap of what 
    you were doing. Pick up mid-thought if that is where the cut happened. 
    Break remaining work into smaller pieces.";

当检测到 stop_reason == "max_tokens" 时：

将部分回复作为 assistant 消息加入对话
追加一条特殊的 user 消息（MAX_TOKENS_RECOVERY_MSG）
继续循环——模型会从中断处继续生成

提示词里的细节值得注意——"no apology, no recap"——因为 LLM 被中断后的本能反应是"抱歉，我刚才被打断了，让我重新开始..."。这会导致无用输出。这条提示词直接禁止了这种模式。

auto_compact：当上下文太长时

LLM 的上下文窗口是有限的。当对话越来越长，工具结果越积越多，总有塞不下的时刻。

BoxAgnts 的响应是自动压缩。触发条件是当 token 估算达到上下文窗口的 90% 时：


// boxagnts-query/src/compact.rs
const AUTOCOMPACT_TRIGGER_FRACTION: f64 = 0.90;
const WARNING_PCT: f64 = 0.80;   // 80% 时警告
const CRITICAL_PCT: f64 = 0.95;  // 95% 时严重警告

压缩策略的核心是调用另一个 LLM 来"总结"对话历史：


原始对话（可能几千条消息）
     │
     ▼
压缩 Prompt（NO_TOOLS_PREAMBLE → 强制总结模式）
     │
     ▼
LLM 生成结构化摘要：
    · Primary Request and Intent    （用户原始请求）
    · Key Technical Concepts        （关键技术概念）
    · Files and Code Sections       （涉及的文件和代码段）
    · Errors and fixes              （遇到的错误和修复）
    · Pending Tasks                 （待完成任务）
    · Current Work                  （当前进度）
     │
     ▼
摘要替换早期对话历史，最近 10 条消息保留原文

压缩 prompt 中有一个关键设计——NO_TOOLS_PREAMBLE：


CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.
- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already ha ve all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn.

如果压缩的 LLM 尝试调用工具，整个压缩就白费了。这个 preamble 防止了这种元递归。

Tool 执行：从 AI 决定到运行结果

当 LLM 返回 stop_reason == "tool_use" 时，对话进入工具执行阶段：


┌──────────────────────────────────────────────┐
│  Phase 1: 顺序执行 PreToolUse 预处理          │
│ （每个 tool block 顺序处理，可中断执行）      │
├──────────────────────────────────────────────┤
│  Phase 2: 并行执行非阻塞工具                  │
│  join_all(futures) → 所有工具并发运行          │
│ （阻塞的工具返回预计算的错误结果）            │
└──────────────────────────────────────────────┘

关键设计点：工具结果以 user 消息格式注入。这利用了 LLM 的消息角色语义——Assistant 发起了工具调用，User（即系统代用户）返回了工具结果。模型将此理解为"用户回答了你的请求"，自然地进行下一轮推理。

execute_tool：工具分发的核心


// boxagnts-query/src/lib.rs
async fn execute_tool(
    name: &str,
    input: &Value,
    tools: &[Box],
    ctx: &ToolContext,
) -> ToolResult {
    let tool = tools.iter().find(|t| t.name() == name);

    match tool {
        Some(tool) => {
            debug!(tool = name, "Executing tool");
            tool.execute(input.clone(), ctx).await
        }
        None => {
            warn!(tool = name, "Unknown tool requested");
            ToolResult::error(format!("Unknown tool: {}", name))
        }
    }
}

极其简单的实现——一个线性查找。tools 向量通常只有十几个元素，线性查找的开销可以忽略。简洁比复杂更可靠。

托管 Agent 模式：Manager-Executor 架构

当任务复杂度超出单个 Agent 的能力范围时，BoxAgnts 提供了托管 Agent 模式：


                 ┌──────────────────┐
                 │   Manager Agent  │
                 │  (Opus 等强模型) │
                 │  只做规划和分配   │
                 └────────┬─────────┘
                          │
             ┌────────────┼──────────────┐
             ▼            ▼              ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │ Executor │ │ Executor │ │ Executor │
        │ (Sonnet) │ │ (Sonnet) │ │ (Sonnet) │
        │ 子任务 A │ │ 子任务 B │ │ 子任务 C │
        └──────────┘ └──────────┘ └──────────┘
           并行执行，各自有独立上下文

Manager 的 system prompt 被注入托管模式指令：


pub fn managed_agent_system_prompt(config: &ManagedAgentConfig) -> String {
    format!(r#"
## Managed Agent Mode
You are the MANAGER in a manager-executor architecture.
### Your Role
- You coordinate work but do NOT execute tasks directly.
- Delegate all implementation work to executor agents.
- Each executor uses model `{executor_model}` with up to {max_turns} turns.
- You may run up to {max_concurrent} executors in parallel.

### Workflow
1. Analyze the user's request and break into sub-tasks.
2. Spawn executors using the Agent tool.
3. Review results. If insufficient, spawn follow-up executors.
4. Synthesize all results into a coherent response."#, ...)
}

Manager 自己不执行工具——它只做规划、分配和结果合成。Executor 是普通的 Agent 实例，拥有完整的工具集。这个模式将"思考"和"执行"分离，既避免了单 Agent 的上下文膨胀，又实现了真正的并行处理。

Skill 系统：让 Agent 学会"专业技能"

Tool 是 Agent 的"手"——读文件、写文件、执行命令。Skill 是 Agent 的"专业知识"——代码审查方法论、CSS 重构指南、前端组件模板。

Skill 的文件格式

一个 Skill 就是一个 SKILL.md 文件：


app/extensions/skills/
├── code-review/SKILL.md
├── css-refactor-advisor/SKILL.md
├── current-weather/SKILL.md
├── weather-forecast/SKILL.md
└── front-component-generator/SKILL.md

SkillTool 的实现


pub struct SkillTool;

#[async_trait]
impl Tool for SkillTool {
    fn name(&self) -> &str { "skill-tool" }

    async fn execute(&self, input: Value, ctx: &ToolContext) -> ToolResult {
        let params: SkillInput = serde_json::from_value(input)?;

        // "skill": "list" → 列出所有可用技能
        if params.skill == "list" {
            return list_skills(&dirs).await;
        }

        // 查找并读取 SKILL.md
        let (skill_path, raw) = find_and_read_skill(&skill_name, &dirs).await?;

        // 去除 YAML frontmatter
        let content = strip_frontmatter(&raw);

        // 替换 $ARGUMENTS 占位符
        let prompt = if let Some(args) = ¶ms.args {
            content.replace("$ARGUMENTS", args)
        } else {
            content.replace("$ARGUMENTS", "")
        };

        ToolResult::success(prompt)
    }
}

Skill 的双层搜索路径

Skill 的搜索优先工作空间目录，然后才是应用扩展目录：


async fn skill_search_dirs(ctx: &ToolContext) -> Vec {
    let mut dirs = vec![
        ctx.get_workspace_extensions_dir().await.join("skills") // 项目级
    ];
    dirs.push(ctx.get_app_extensions_dir().await.join("skills")); // 全局级
    dirs
}

这意味着你可以在项目目录下定义项目专用的 Skill（如"理解这个项目的 build system"），同时使用全局 Skill（如"通用的代码审查标准"）。项目级 Skill 优先于全局 Skill。

$ARGUMENTS 占位符

Skill 模板中最关键的机制是 $ARGUMENTS：


# 代码审查 Skill 模板

请审查：$ARGUMENTS

检查要点：
1. 函数是否过长（>50 行）
2. 是否有未处理的 Result/Option
3. 是否有不必要的 .clone()
4. 命名是否符合 Rust 惯例

AI 调用时传入 args: "src/main.rs"，$ARGUMENTS 就被替换为 src/main.rs。这让 Skill 从"静态知识"变成了"参数化工具"。

流式推送：让用户看到 Agent 在"思考"

整个查询循环通过 event_tx 通道实时推送状态：


pub enum QueryEvent {
    Token { text: String },                       // 逐 token 推送
    ToolStart { tool_name, tool_id, input },      // 工具开始
    ToolEnd { tool_name, tool_id, result },       // 工具结束
    Status(String),                               // 状态消息
}

这些事件通过 WebSocket 实时推送到 Dashboard 前端，用户可以看到 Agent 的每一步决策——不是面对一个黑箱。

小结

AI Agent 的多轮对话是一个复杂的控制系统：


System Prompt → API 调用 → 流式解析 → 工具检测 → 工具执行 → 结果注入 → 再次调用
   ↑                                                                         │
   └───────────────────────── 循环直到 end_turn ─────────────────────────────┘

这个循环的鲁棒性取决于：

机制	解决的问题
Agent 定义系统	多角色、多模型切换
System prompt 构建	Agent 世界观 + prompt caching
max_tokens 恢复	长输出被截断
auto_compact（结构化摘要）	上下文超窗口
tool_result_budget	工具结果堆积
fallback_model	主模型过载
托管 Agent 模式	超复杂任务分解
Skill 系统	专业知识参数化注入
并行工具执行	多步操作加速

每一个机制都对应一个真实的生产问题。把它们做对，Agent 才能从"能跑"变成"可靠"。