diff --git a/README.md b/README.md index 4c9ba7a..6ac2a54 100644 --- a/README.md +++ b/README.md @@ -62,13 +62,45 @@ git clone https://github.com/memovai/mimiclaw.git cd mimiclaw idf.py set-target esp32s3 +``` + +### Configure + +**Option A: Config file (recommended)** — fill in once, baked into firmware at build time: + +```bash +cp main/mimi_secrets.h.example main/mimi_secrets.h +``` + +Edit `main/mimi_secrets.h`: + +```c +#define MIMI_SECRET_WIFI_SSID "YourWiFiName" +#define MIMI_SECRET_WIFI_PASS "YourWiFiPassword" +#define MIMI_SECRET_TG_TOKEN "123456:ABC-DEF1234ghIkl-zyx57W2v1u123ew11" +#define MIMI_SECRET_API_KEY "sk-ant-api03-xxxxx" +#define MIMI_SECRET_SEARCH_KEY "" // optional: Brave Search API key +#define MIMI_SECRET_PROXY_HOST "" // optional: e.g. "10.0.0.1" +#define MIMI_SECRET_PROXY_PORT "" // optional: e.g. "7897" +``` + +Then build and flash: + +```bash idf.py build idf.py -p /dev/ttyACM0 flash monitor ``` -### Set Up +Config file values have the **highest priority** — they override anything set via CLI. -After flashing, a serial console appears. Type these commands: +> **Note:** After editing `mimi_secrets.h`, run `touch main/mimi_config.h` before `idf.py build` to force recompilation. + +**Option B: Serial CLI** — configure at runtime after flashing: + +```bash +idf.py build +idf.py -p /dev/ttyACM0 flash monitor +``` ``` mimi> wifi_set YourWiFiName YourWiFiPassword @@ -77,20 +109,24 @@ mimi> set_api_key sk-ant-api03-xxxxx mimi> restart ``` -That's it. After restart, find your bot on Telegram and start chatting. +CLI values are stored in NVS (persistent flash) and used when no config file value is set. -### More Commands +### CLI Commands ``` -mimi> wifi_status # am I connected? -mimi> set_model claude-opus-4-6 # use a different model -mimi> set_proxy 10.0.0.1 7897 # optional: route through HTTP proxy -mimi> clear_proxy # optional: remove proxy, connect directly -mimi> memory_read # see what the bot remembers -mimi> heap_info # how much RAM is free? -mimi> session_list # list all chat sessions -mimi> session_clear 12345 # wipe a conversation -mimi> restart # reboot +mimi> wifi_set # set WiFi credentials +mimi> wifi_status # am I connected? +mimi> set_tg_token # set Telegram bot token +mimi> set_api_key # set Anthropic API key +mimi> set_model claude-opus-4-6 # use a different model +mimi> set_search_key # set Brave Search API key (for web_search tool) +mimi> set_proxy 10.0.0.1 7897 # route through HTTP proxy +mimi> clear_proxy # remove proxy, connect directly +mimi> memory_read # see what the bot remembers +mimi> heap_info # how much RAM is free? +mimi> session_list # list all chat sessions +mimi> session_clear 12345 # wipe a conversation +mimi> restart # reboot ``` ## Memory @@ -105,12 +141,23 @@ MimiClaw stores everything as plain text files you can read and edit: | `2026-02-05.md` | Daily notes — what happened today | | `tg_12345.jsonl` | Chat history — your conversation with the bot | +## Tools + +MimiClaw uses Anthropic's tool use protocol — Claude can call tools during a conversation and loop until the task is done (ReAct pattern). + +| Tool | Description | +|------|-------------| +| `web_search` | Search the web via Brave Search API for current information | + +To enable web search, set a [Brave Search API key](https://brave.com/search/api/) in your config file or via CLI (`set_search_key`). + ## Also Included - **WebSocket gateway** on port 18789 — connect from your LAN with any WebSocket client - **OTA updates** — flash new firmware over WiFi, no USB needed - **Dual-core** — network I/O and AI processing run on separate CPU cores - **HTTP proxy** — CONNECT tunnel support for restricted networks +- **Tool use** — ReAct agent loop with Anthropic tool use protocol ## For Developers diff --git a/README_CN.md b/README_CN.md index d921599..921147a 100644 --- a/README_CN.md +++ b/README_CN.md @@ -62,22 +62,54 @@ git clone https://github.com/memovai/mimiclaw.git cd mimiclaw idf.py set-target esp32s3 +``` + +### 配置 + +**方式 A:配置文件(推荐)** — 填一次,编译时写入固件: + +```bash +cp main/mimi_secrets.h.example main/mimi_secrets.h +``` + +编辑 `main/mimi_secrets.h`: + +```c +#define MIMI_SECRET_WIFI_SSID "你的WiFi名" +#define MIMI_SECRET_WIFI_PASS "你的WiFi密码" +#define MIMI_SECRET_TG_TOKEN "123456:ABC-DEF1234ghIkl-zyx57W2v1u123ew11" +#define MIMI_SECRET_API_KEY "sk-ant-api03-xxxxx" +#define MIMI_SECRET_SEARCH_KEY "" // 可选:Brave Search API key +#define MIMI_SECRET_PROXY_HOST "10.0.0.1" // 可选:代理地址 +#define MIMI_SECRET_PROXY_PORT "7897" // 可选:代理端口 +``` + +然后编译烧录: + +```bash idf.py build idf.py -p /dev/ttyACM0 flash monitor ``` -### 设置 +配置文件的值**优先级最高** — 会覆盖 CLI 设置的值。 -烧录后会出现串口终端,输入以下命令: +> **注意**:修改 `mimi_secrets.h` 后,需要先执行 `touch main/mimi_config.h` 再 `idf.py build`,否则不会重新编译。 + +**方式 B:串口命令行** — 烧录后在运行时配置: + +```bash +idf.py build +idf.py -p /dev/ttyACM0 flash monitor +``` ``` -mimi> wifi_set YourWiFiName YourWiFiPassword +mimi> wifi_set 你的WiFi名 你的WiFi密码 mimi> set_tg_token 123456:ABC-DEF1234ghIkl-zyx57W2v1u123ew11 mimi> set_api_key sk-ant-api03-xxxxx mimi> restart ``` -就这样。重启后在 Telegram 找到你的 Bot,开始聊天。 +CLI 设置的值存在 NVS(持久 Flash)中,仅在配置文件未设置对应值时生效。 ### 代理配置(国内用户) @@ -85,15 +117,14 @@ mimi> restart **前提**:局域网内有一个支持 HTTP CONNECT 的代理(Clash Verge、V2Ray 等),并开启了「允许局域网连接」。 +推荐直接在 `mimi_secrets.h` 中配置代理(见上方方式 A),也可以用命令行: + ``` mimi> set_proxy 10.0.0.1 7897 mimi> restart ``` -- `10.0.0.1` — 代理机器的局域网 IP -- `7897` — 代理的 HTTP 端口(不是 SOCKS 端口) - -设置后所有 HTTPS 请求通过 CONNECT 隧道发出,TLS 证书正常验证。清除代理恢复直连: +清除代理恢复直连: ``` mimi> clear_proxy @@ -102,18 +133,22 @@ mimi> restart > **提示**:确保 ESP32-S3 和代理机器在同一局域网。Clash Verge 在「设置 → 允许局域网」中开启。 -### 更多命令 +### 所有命令 ``` -mimi> wifi_status # 连上了吗? -mimi> set_model claude-opus-4-6 # 换个模型 -mimi> set_proxy 10.0.0.1 7897 # 可选:通过 HTTP 代理 -mimi> clear_proxy # 可选:清除代理,直连 -mimi> memory_read # 看看它记住了什么 -mimi> heap_info # 还剩多少内存? -mimi> session_list # 列出所有会话 -mimi> session_clear 12345 # 删除一个会话 -mimi> restart # 重启 +mimi> wifi_set # 设置 WiFi +mimi> wifi_status # 连上了吗? +mimi> set_tg_token # 设置 Telegram Bot Token +mimi> set_api_key # 设置 Anthropic API Key +mimi> set_model claude-opus-4-6 # 换个模型 +mimi> set_search_key # 设置 Brave Search API Key(web_search 工具用) +mimi> set_proxy 10.0.0.1 7897 # 通过 HTTP 代理 +mimi> clear_proxy # 清除代理,直连 +mimi> memory_read # 看看它记住了什么 +mimi> heap_info # 还剩多少内存? +mimi> session_list # 列出所有会话 +mimi> session_clear 12345 # 删除一个会话 +mimi> restart # 重启 ``` ## 记忆 @@ -128,12 +163,23 @@ MimiClaw 把所有数据存为纯文本文件,可以直接读取和编辑: | `2026-02-05.md` | 每日笔记 — 今天发生了什么 | | `tg_12345.jsonl` | 聊天记录 — 你和它的对话 | +## 工具 + +MimiClaw 使用 Anthropic 的 tool use 协议 — Claude 在对话中可以调用工具,循环执行直到任务完成(ReAct 模式)。 + +| 工具 | 说明 | +|------|------| +| `web_search` | 通过 Brave Search API 搜索网页,获取实时信息 | + +启用网页搜索需要设置 [Brave Search API key](https://brave.com/search/api/),在配置文件或 CLI(`set_search_key`)中设置。 + ## 其他功能 - **WebSocket 网关** — 端口 18789,局域网内用任意 WebSocket 客户端连接 - **OTA 更新** — WiFi 远程刷固件,无需 USB - **双核** — 网络 I/O 和 AI 处理分别跑在不同 CPU 核心 - **HTTP 代理** — CONNECT 隧道,适配受限网络 +- **工具调用** — ReAct Agent 循环,Anthropic tool use 协议 ## 开发者 diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index b85b072..8e613bf 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -20,41 +20,46 @@ Telegram App (User) │ │ Poller │ └────────┬─────────┘ │ │ │ (Core 0) │ │ │ │ └─────────────┘ ▼ │ -│ ┌──────────────┐ │ -│ ┌─────────────┐ │ Agent Loop │ │ -│ │ WebSocket │──────▶│ (Core 1) │ │ -│ │ Server │ │ │ │ -│ │ (:18789) │ │ Context ──▶ LLM Proxy │ -│ └─────────────┘ │ Builder (HTTPS) │ -│ └──────┬───────┘ │ -│ ┌─────────────┐ │ │ -│ │ Serial CLI │ ▼ │ -│ │ (Core 0) │ ┌──────────────┐ │ -│ └─────────────┘ │ Outbound Queue│ │ -│ └──────┬───────┘ │ -│ │ │ -│ ┌──────▼───────┐ │ -│ │ Outbound │ │ -│ │ Dispatch │ │ -│ │ (Core 0) │ │ -│ └──┬────────┬──┘ │ -│ │ │ │ -│ Telegram WebSocket │ -│ sendMessage send │ -│ │ -│ ┌──────────────────────────────────────────┐ │ -│ │ SPIFFS (12 MB) │ │ -│ │ /spiffs/config/ SOUL.md, USER.md │ │ -│ │ /spiffs/memory/ MEMORY.md, YYYY-MM-DD │ │ -│ │ /spiffs/sessions/ tg_.jsonl │ │ -│ └──────────────────────────────────────────┘ │ -└──────────────────────────────────────────────────┘ +│ ┌────────────────────────┐ │ +│ ┌─────────────┐ │ Agent Loop │ │ +│ │ WebSocket │─▶│ (Core 1) │ │ +│ │ Server │ │ │ │ +│ │ (:18789) │ │ Context ──▶ LLM Proxy │ │ +│ └─────────────┘ │ Builder (HTTPS) │ │ +│ │ ▲ │ │ │ +│ ┌─────────────┐ │ │ tool_use? │ │ +│ │ Serial CLI │ │ │ ▼ │ │ +│ │ (Core 0) │ │ Tool Results ◀─ Tools │ │ +│ └─────────────┘ │ (web_search)│ │ +│ └──────────┬─────────────┘ │ +│ │ │ +│ ┌──────▼───────┐ │ +│ │ Outbound Queue│ │ +│ └──────┬───────┘ │ +│ │ │ +│ ┌──────▼───────┐ │ +│ │ Outbound │ │ +│ │ Dispatch │ │ +│ │ (Core 0) │ │ +│ └──┬────────┬──┘ │ +│ │ │ │ +│ Telegram WebSocket │ +│ sendMessage send │ +│ │ +│ ┌──────────────────────────────────────────┐ │ +│ │ SPIFFS (12 MB) │ │ +│ │ /spiffs/config/ SOUL.md, USER.md │ │ +│ │ /spiffs/memory/ MEMORY.md, YYYY-MM-DD │ │ +│ │ /spiffs/sessions/ tg_.jsonl │ │ +│ └──────────────────────────────────────────┘ │ +└───────────────────────────────────────────────────┘ │ - │ Anthropic Messages API (HTTPS + SSE) + │ Anthropic Messages API (HTTPS) + │ + Brave Search API (HTTPS) ▼ - ┌───────────┐ - │ Claude API │ - └───────────┘ + ┌───────────┐ ┌──────────────┐ + │ Claude API │ │ Brave Search │ + └───────────┘ └──────────────┘ ``` --- @@ -67,12 +72,18 @@ Telegram App (User) 3. Message pushed to Inbound Queue (FreeRTOS xQueue) 4. Agent Loop (Core 1) pops message: a. Load session history from SPIFFS (JSONL) - b. Build system prompt (SOUL.md + USER.md + MEMORY.md + recent notes) - c. Build messages array (history + current message) - d. Call Claude API via HTTPS (SSE streaming) - e. Accumulate streamed response tokens - f. Save user + assistant messages to session file - g. Push response to Outbound Queue + b. Build system prompt (SOUL.md + USER.md + MEMORY.md + recent notes + tool guidance) + c. Build cJSON messages array (history + current message) + d. ReAct loop (max 10 iterations): + i. Call Claude API via HTTPS (non-streaming, with tools array) + ii. Parse JSON response → text blocks + tool_use blocks + iii. If stop_reason == "tool_use": + - Execute each tool (e.g. web_search → Brave Search API) + - Append assistant content + tool_result to messages + - Continue loop + iv. If stop_reason == "end_turn": break with final text + e. Save user message + final assistant text to session file + f. Push response to Outbound Queue 5. Outbound Dispatch (Core 0) pops response: a. Route by channel field ("telegram" → sendMessage, "websocket" → WS frame) 6. User receives reply @@ -85,7 +96,9 @@ Telegram App (User) ``` main/ ├── mimi.c Entry point — app_main() orchestrates init + startup -├── mimi_config.h All compile-time constants in one place +├── mimi_config.h All compile-time constants + build-time secrets include +├── mimi_secrets.h Build-time credentials (gitignored, highest priority) +├── mimi_secrets.h.example Template for mimi_secrets.h │ ├── bus/ │ ├── message_bus.h mimi_msg_t struct, queue API @@ -100,14 +113,20 @@ main/ │ └── telegram_bot.c Long polling loop, JSON parsing, message splitting │ ├── llm/ -│ ├── llm_proxy.h llm_chat() API -│ └── llm_proxy.c Anthropic Messages API, SSE stream parser +│ ├── llm_proxy.h llm_chat() + llm_chat_tools() API, tool_use types +│ └── llm_proxy.c Anthropic Messages API (non-streaming), tool_use parsing │ ├── agent/ │ ├── agent_loop.h Agent task init/start -│ ├── agent_loop.c Main processing loop: inbound → context → LLM → outbound +│ ├── agent_loop.c ReAct loop: LLM call → tool execution → repeat │ ├── context_builder.h System prompt + messages builder API -│ └── context_builder.c Reads bootstrap files + memory, assembles prompt +│ └── context_builder.c Reads bootstrap files + memory + tool guidance +│ +├── tools/ +│ ├── tool_registry.h Tool definition struct, register/dispatch API +│ ├── tool_registry.c Tool registration, JSON schema builder, dispatch by name +│ ├── tool_web_search.h Web search tool API +│ └── tool_web_search.c Brave Search API via HTTPS (direct + proxy) │ ├── memory/ │ ├── memory_store.h Long-term + daily memory API @@ -125,7 +144,7 @@ main/ │ ├── cli/ │ ├── serial_cli.h CLI init API -│ └── serial_cli.c esp_console REPL with 14 commands +│ └── serial_cli.c esp_console REPL with 15 commands │ └── ota/ ├── ota_manager.h OTA update API @@ -206,17 +225,20 @@ Session files are JSONL (one JSON object per line): ## NVS Configuration -| Namespace | Key | Description | -|---------------|--------------|-----------------------------------------| -| `wifi_config` | `ssid` | WiFi SSID | -| `wifi_config` | `password` | WiFi password | -| `tg_config` | `bot_token` | Telegram Bot API token | -| `llm_config` | `api_key` | Anthropic API key | -| `llm_config` | `model` | Model ID (default: claude-opus-4-6) | -| `proxy_config`| `host` | HTTP proxy hostname/IP | -| `proxy_config`| `port` | HTTP proxy port | +| Namespace | Key | Description | +|-----------------|--------------|-----------------------------------------| +| `wifi_config` | `ssid` | WiFi SSID | +| `wifi_config` | `password` | WiFi password | +| `tg_config` | `bot_token` | Telegram Bot API token | +| `llm_config` | `api_key` | Anthropic API key | +| `llm_config` | `model` | Model ID (default: claude-opus-4-6) | +| `proxy_config` | `host` | HTTP proxy hostname/IP | +| `proxy_config` | `port` | HTTP proxy port | +| `search_config` | `api_key` | Brave Search API key | -All configured via Serial CLI commands: `wifi_set`, `set_tg_token`, `set_api_key`, `set_model`, `set_proxy`, `clear_proxy`. +**Configuration priority**: `mimi_secrets.h` (build-time) > NVS (CLI-set) > defaults. + +All configurable via Serial CLI or build-time config file (`mimi_secrets.h`). --- @@ -260,33 +282,50 @@ Client `chat_id` is auto-assigned on connection (`ws_`) but can be overridde Endpoint: `POST https://api.anthropic.com/v1/messages` -Request format (Anthropic-native, not OpenAI): +Request format (Anthropic-native, non-streaming, with tools): ```json { "model": "claude-opus-4-6", "max_tokens": 4096, - "stream": true, "system": "", + "tools": [ + { + "name": "web_search", + "description": "Search the web for current information.", + "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]} + } + ], "messages": [ {"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi!"}, - {"role": "user", "content": "How are you?"} + {"role": "user", "content": "What's the weather today?"} ] } ``` Key difference from OpenAI: `system` is a top-level field, not inside the `messages` array. -SSE streaming response events: -``` -event: content_block_delta -data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}} - -event: message_stop -data: {"type":"message_stop"} +Non-streaming JSON response: +```json +{ + "id": "msg_xxx", + "type": "message", + "role": "assistant", + "content": [ + {"type": "text", "text": "Let me search for that."}, + {"type": "tool_use", "id": "toolu_xxx", "name": "web_search", "input": {"query": "weather today"}} + ], + "stop_reason": "tool_use" +} ``` -The SSE parser in `llm_proxy.c` accumulates `text_delta` tokens into a response buffer. +When `stop_reason` is `"tool_use"`, the agent loop executes each tool and sends results back: +```json +{"role": "assistant", "content": []} +{"role": "user", "content": [{"type": "tool_result", "tool_use_id": "toolu_xxx", "content": "..."}]} +``` + +The loop repeats until `stop_reason` is `"end_turn"` (max 10 iterations). --- @@ -301,13 +340,14 @@ app_main() ├── memory_store_init() Verify SPIFFS paths ├── session_mgr_init() ├── wifi_manager_init() Init WiFi STA mode + event handlers - ├── http_proxy_init() Load proxy config from NVS - ├── telegram_bot_init() Load bot token from NVS - ├── llm_proxy_init() Load API key + model from NVS + ├── http_proxy_init() Load proxy config (secrets > NVS) + ├── telegram_bot_init() Load bot token (secrets > NVS) + ├── llm_proxy_init() Load API key + model (secrets > NVS) + ├── tool_registry_init() Register tools, build tools JSON ├── agent_loop_init() ├── serial_cli_init() Start REPL (works without WiFi) │ - ├── wifi_manager_start() Connect using NVS credentials + ├── wifi_manager_start() Connect (secrets > NVS credentials) │ └── wifi_manager_wait_connected(30s) │ └── [if WiFi connected] @@ -330,6 +370,7 @@ If WiFi credentials are missing or connection times out, the CLI remains availab | `set_tg_token ` | Save Telegram bot token | | `set_api_key ` | Save Anthropic API key | | `set_model ` | Set LLM model identifier | +| `set_search_key ` | Save Brave Search API key | | `set_proxy ` | Set HTTP CONNECT proxy | | `clear_proxy` | Remove proxy, use direct connection | | `memory_read` | Print MEMORY.md contents | @@ -340,22 +381,24 @@ If WiFi credentials are missing or connection times out, the CLI remains availab | `restart` | Reboot the device | | `help` | List all available commands | +> **Note**: CLI-set values are stored in NVS but are overridden by `mimi_secrets.h` build-time values if set. + --- ## Nanobot Reference Mapping | Nanobot Module | MimiClaw Equivalent | Notes | |-----------------------------|--------------------------------|------------------------------| -| `agent/loop.py` | `agent/agent_loop.c` | Simplified: no tool use loop | -| `agent/context.py` | `agent/context_builder.c` | Loads SOUL.md + USER.md + memory | +| `agent/loop.py` | `agent/agent_loop.c` | ReAct loop with tool use | +| `agent/context.py` | `agent/context_builder.c` | Loads SOUL.md + USER.md + memory + tool guidance | | `agent/memory.py` | `memory/memory_store.c` | MEMORY.md + daily notes | | `session/manager.py` | `memory/session_mgr.c` | JSONL per chat, ring buffer | | `channels/telegram.py` | `telegram/telegram_bot.c` | Raw HTTP, no python-telegram-bot | | `bus/events.py` + `queue.py`| `bus/message_bus.c` | FreeRTOS queues vs asyncio | | `providers/litellm_provider.py` | `llm/llm_proxy.c` | Direct Anthropic API only | -| `config/schema.py` | `mimi_config.h` + NVS | Compile-time + NVS storage | +| `config/schema.py` | `mimi_config.h` + `mimi_secrets.h` + NVS | Build-time secrets > NVS | | `cli/commands.py` | `cli/serial_cli.c` | esp_console REPL | -| `agent/tools/*` | *(not yet implemented)* | See TODO.md | +| `agent/tools/*` | `tools/tool_registry.c` + `tool_web_search.c` | web_search via Brave API | | `agent/subagent.py` | *(not yet implemented)* | See TODO.md | | `agent/skills.py` | *(not yet implemented)* | See TODO.md | | `cron/service.py` | *(not yet implemented)* | See TODO.md | diff --git a/docs/TODO.md b/docs/TODO.md index 0bba9c8..92026a3 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -7,11 +7,8 @@ ## P0 — Core Agent Capabilities -### [ ] Tool Use Loop (multi-turn agent iteration) -- **nanobot**: `loop.py` L167-210 — while loop calls LLM, checks `response.has_tool_calls`, executes tools, feeds results back into messages, repeats until LLM stops calling tools (max 20 iterations) -- **MimiClaw**: `agent_loop.c` only makes a single LLM call (one-shot), cannot use any tools -- **Scope**: Need to parse Anthropic API `tool_use` content blocks, implement tool execution loop -- **Note**: Anthropic tool_use format differs from OpenAI — uses content blocks, not function_call +### [x] ~~Tool Use Loop (multi-turn agent iteration)~~ +- Implemented: `agent_loop.c` ReAct loop with `llm_chat_tools()`, max 10 iterations, non-streaming JSON parsing ### [ ] Memory Write via Tool Use (agent-driven memory persistence) - **openclaw**: Agent uses standard `write`/`edit` tools to write `MEMORY.md` and `memory/YYYY-MM-DD.md`; system prompt instructs agent to persist important information; pre-compaction memory flush triggers a silent agent turn to save durable memories before context window limit @@ -19,20 +16,13 @@ - **Scope**: Expose `memory_write` and `memory_append_today` as tool_use tools for Claude; add system prompt guidance on when to persist memory; optionally add pre-compaction flush (trigger memory save when session history nears `MIMI_SESSION_MAX_MSGS`) - **Depends on**: Tool Use Loop -### [ ] Tool Registry + Built-in Tools -- **nanobot**: `tools/registry.py` — dynamic tool registration/execution, `tools/base.py` defines abstract Tool base class -- **nanobot built-in tools**: - - `read_file` — read files (`tools/filesystem.py`) - - `write_file` — write files - - `edit_file` — edit files - - `list_dir` — list directory - - `exec` — execute shell commands (`tools/shell.py`) - - `web_search` — web search (`tools/web.py`) - - `web_fetch` — fetch web pages - - `message` — send message to user (`tools/message.py`) - - `spawn` — launch subagent (`tools/spawn.py`) -- **MimiClaw**: No tool system at all -- **Recommendation**: Reasonable tool subset for ESP32: `read_file`, `write_file`, `list_dir` (SPIFFS), `message`. Shell/web not suitable for MCU +### [x] ~~Tool Registry + web_search Tool~~ +- Implemented: `tools/tool_registry.c` — tool registration, JSON schema builder, dispatch by name +- Implemented: `tools/tool_web_search.c` — Brave Search API via HTTPS (direct + proxy support) + +### [ ] More Built-in Tools +- **nanobot built-in tools** not yet ported: `read_file`, `write_file`, `edit_file`, `list_dir`, `message` +- **Recommendation**: Reasonable tool subset for ESP32: `read_file`, `write_file`, `list_dir` (SPIFFS), `message`, `memory_write` ### [ ] Subagent / Spawn Background Tasks - **nanobot**: `subagent.py` — SubagentManager spawns independent agent instances with isolated tool sets and system prompts, announces results back to main agent via system channel @@ -77,10 +67,8 @@ - **MimiClaw**: `context_builder.c` only reads last 3 days - **Recommendation**: Make configurable, but mind token budget -### [ ] System Prompt Tool Guidance -- **nanobot**: `context.py` L74-101 — includes current time, workspace path, tool usage instructions -- **MimiClaw**: Has current time, but lacks tool usage guide and workspace description -- **Depends on**: Tool Use implementation +### [x] ~~System Prompt Tool Guidance~~ +- Implemented: `context_builder.c` includes tool usage guidance in system prompt ### [ ] Message Metadata (media, reply_to, metadata) - **nanobot**: `bus/events.py` — InboundMessage has media, metadata fields; OutboundMessage has reply_to @@ -116,10 +104,9 @@ - **MimiClaw**: Not implemented - **Recommendation**: Requires extra HTTPS request to Whisper API: download Telegram voice -> forward -> get text -### [ ] YAML Config File System -- **nanobot**: `config/loader.py` + `config/schema.py` — Pydantic config validation, YAML config support -- **MimiClaw**: All configuration via NVS key-value storage -- **Recommendation**: Current NVS approach is suitable for MCU, no change needed +### [x] ~~Build-time Config File~~ +- Implemented: `mimi_secrets.h` — build-time credentials with highest priority over NVS/CLI +- Replaces need for YAML config; suitable for MCU workflow ### [ ] WebSocket Gateway Protocol Enhancement - **nanobot**: Gateway port 18790 + richer protocol @@ -150,32 +137,34 @@ - [x] Telegram Bot long polling (getUpdates) - [x] Message Bus (inbound/outbound queues) -- [x] Agent Loop basic flow (single LLM call) -- [x] Claude API (Anthropic Messages API + SSE streaming) -- [x] Context Builder (system prompt + bootstrap files + memory) +- [x] Agent Loop with ReAct tool use (multi-turn, max 10 iterations) +- [x] Claude API (Anthropic Messages API, non-streaming, tool_use protocol) +- [x] Tool Registry + web_search tool (Brave Search API) +- [x] Context Builder (system prompt + bootstrap files + memory + tool guidance) - [x] Memory Store (MEMORY.md + daily notes) - [x] Session Manager (JSONL per chat_id, ring buffer history) - [x] WebSocket Gateway (port 18789, JSON protocol) -- [x] Serial CLI (esp_console, 14 commands) -- [x] HTTP CONNECT Proxy (Telegram + Claude API via proxy tunnel) +- [x] Serial CLI (esp_console, 15 commands) +- [x] HTTP CONNECT Proxy (Telegram + Claude API + Brave Search via proxy tunnel) - [x] OTA Update - [x] WiFi Manager (NVS credentials, exponential backoff) - [x] SPIFFS storage -- [x] NVS configuration (token, API key, model) +- [x] Build-time config (`mimi_secrets.h`, highest priority over NVS) +- [x] NVS configuration (token, API key, model, search key) --- ## Suggested Implementation Order ``` -1. Tool Use Loop + Tool Registry <- this determines whether the agent is truly "intelligent" +1. [done] Tool Use Loop + Tool Registry + web_search 2. Memory Write via Tool Use <- makes the agent actually remember 3. Built-in Tools (read_file, write_file, message) -3. Telegram Allowlist (allow_from) <- security essential -4. Bootstrap File Completion (AGENTS.md, TOOLS.md) -5. Subagent (simplified) -6. Telegram Markdown -> HTML -7. Media Handling -8. Cron / Heartbeat -9. Other enhancements +4. Telegram Allowlist (allow_from) <- security essential +5. Bootstrap File Completion (AGENTS.md, TOOLS.md) +6. Subagent (simplified) +7. Telegram Markdown -> HTML +8. Media Handling +9. Cron / Heartbeat +10. Other enhancements ```