From bf772cb8925585e7addd036b15d9f009cbbfcca1 Mon Sep 17 00:00:00 2001 From: crispyberry Date: Thu, 5 Feb 2026 18:56:14 +0800 Subject: [PATCH] docs: add architecture design and feature gap tracker ARCHITECTURE.md covers system diagram, data flow, module map, task layout, memory budget, flash partitions, NVS config, protocols, and nanobot reference mapping. TODO.md tracks unimplemented features vs nanobot (P0/P1/P2). Co-Authored-By: Claude Opus 4.5 --- docs/ARCHITECTURE.md | 353 +++++++++++++++++++++++++++++++++++++++++++ docs/TODO.md | 175 +++++++++++++++++++++ 2 files changed, 528 insertions(+) create mode 100644 docs/ARCHITECTURE.md create mode 100644 docs/TODO.md diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..b8668d1 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,353 @@ +# MimiClaw Architecture + +> ESP32-S3 AI Agent firmware — a C/FreeRTOS reimplementation of [nanobot](../nanobot/)'s core agent capabilities. + +--- + +## System Overview + +``` +Telegram App (User) + │ + │ HTTPS Long Polling + │ + ▼ +┌──────────────────────────────────────────────────┐ +│ ESP32-S3 (MimiClaw) │ +│ │ +│ ┌─────────────┐ ┌──────────────────┐ │ +│ │ Telegram │──────▶│ Inbound Queue │ │ +│ │ Poller │ └────────┬─────────┘ │ +│ │ (Core 0) │ │ │ +│ └─────────────┘ ▼ │ +│ ┌──────────────┐ │ +│ ┌─────────────┐ │ Agent Loop │ │ +│ │ WebSocket │──────▶│ (Core 1) │ │ +│ │ Server │ │ │ │ +│ │ (:18789) │ │ Context ──▶ LLM Proxy │ +│ └─────────────┘ │ Builder (HTTPS) │ +│ └──────┬───────┘ │ +│ ┌─────────────┐ │ │ +│ │ Serial CLI │ ▼ │ +│ │ (Core 0) │ ┌──────────────┐ │ +│ └─────────────┘ │ Outbound Queue│ │ +│ └──────┬───────┘ │ +│ │ │ +│ ┌──────▼───────┐ │ +│ │ Outbound │ │ +│ │ Dispatch │ │ +│ │ (Core 0) │ │ +│ └──┬────────┬──┘ │ +│ │ │ │ +│ Telegram WebSocket │ +│ sendMessage send │ +│ │ +│ ┌──────────────────────────────────────────┐ │ +│ │ SPIFFS (12 MB) │ │ +│ │ /spiffs/config/ SOUL.md, USER.md │ │ +│ │ /spiffs/memory/ MEMORY.md, YYYY-MM-DD │ │ +│ │ /spiffs/sessions/ tg_.jsonl │ │ +│ └──────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────┘ + │ + │ Anthropic Messages API (HTTPS + SSE) + ▼ + ┌───────────┐ + │ Claude API │ + └───────────┘ +``` + +--- + +## Data Flow + +``` +1. User sends message on Telegram (or WebSocket) +2. Channel poller receives message, wraps in mimi_msg_t +3. Message pushed to Inbound Queue (FreeRTOS xQueue) +4. Agent Loop (Core 1) pops message: + a. Load session history from SPIFFS (JSONL) + b. Build system prompt (SOUL.md + USER.md + MEMORY.md + recent notes) + c. Build messages array (history + current message) + d. Call Claude API via HTTPS (SSE streaming) + e. Accumulate streamed response tokens + f. Save user + assistant messages to session file + g. Push response to Outbound Queue +5. Outbound Dispatch (Core 0) pops response: + a. Route by channel field ("telegram" → sendMessage, "websocket" → WS frame) +6. User receives reply +``` + +--- + +## Module Map + +``` +main/ +├── mimi.c Entry point — app_main() orchestrates init + startup +├── mimi_config.h All compile-time constants in one place +│ +├── bus/ +│ ├── message_bus.h mimi_msg_t struct, queue API +│ └── message_bus.c Two FreeRTOS queues: inbound + outbound +│ +├── wifi/ +│ ├── wifi_manager.h WiFi STA lifecycle API +│ └── wifi_manager.c NVS credentials, event handler, exponential backoff +│ +├── telegram/ +│ ├── telegram_bot.h Bot init/start, send_message API +│ └── telegram_bot.c Long polling loop, JSON parsing, message splitting +│ +├── llm/ +│ ├── llm_proxy.h llm_chat() API +│ └── llm_proxy.c Anthropic Messages API, SSE stream parser +│ +├── agent/ +│ ├── agent_loop.h Agent task init/start +│ ├── agent_loop.c Main processing loop: inbound → context → LLM → outbound +│ ├── context_builder.h System prompt + messages builder API +│ └── context_builder.c Reads bootstrap files + memory, assembles prompt +│ +├── memory/ +│ ├── memory_store.h Long-term + daily memory API +│ ├── memory_store.c MEMORY.md read/write, daily .md append/read +│ ├── session_mgr.h Per-chat session API +│ └── session_mgr.c JSONL session files, ring buffer history +│ +├── gateway/ +│ ├── ws_server.h WebSocket server API +│ └── ws_server.c ESP HTTP server with WS upgrade, client tracking +│ +├── cli/ +│ ├── serial_cli.h CLI init API +│ └── serial_cli.c esp_console REPL with 12 commands +│ +└── ota/ + ├── ota_manager.h OTA update API + └── ota_manager.c esp_https_ota wrapper +``` + +--- + +## FreeRTOS Task Layout + +| Task | Core | Priority | Stack | Description | +|--------------------|------|----------|--------|--------------------------------------| +| `tg_poll` | 0 | 5 | 8 KB | Telegram long polling (30s timeout) | +| `agent_loop` | 1 | 6 | 8 KB | Message processing + Claude API call | +| `outbound` | 0 | 5 | 4 KB | Route responses to Telegram / WS | +| `serial_cli` | 0 | 3 | 4 KB | USB serial console REPL | +| httpd (internal) | 0 | 5 | — | WebSocket server (esp_http_server) | +| wifi_event (IDF) | 0 | 8 | — | WiFi event handling (ESP-IDF) | + +**Core allocation strategy**: Core 0 handles I/O (network, serial, WiFi). Core 1 is dedicated to the agent loop (CPU-bound JSON building + waiting on HTTPS). + +--- + +## Memory Budget + +| Purpose | Location | Size | +|------------------------------------|----------------|----------| +| FreeRTOS task stacks | Internal SRAM | ~40 KB | +| WiFi buffers | Internal SRAM | ~30 KB | +| TLS connections x2 (Telegram + Claude) | PSRAM | ~120 KB | +| JSON parse buffers | PSRAM | ~32 KB | +| Session history cache | PSRAM | ~32 KB | +| System prompt buffer | PSRAM | ~16 KB | +| LLM response stream buffer | PSRAM | ~32 KB | +| Remaining available | PSRAM | ~7.7 MB | + +Large buffers (32 KB+) are allocated from PSRAM via `heap_caps_calloc(1, size, MALLOC_CAP_SPIRAM)`. + +--- + +## Flash Partition Layout + +``` +Offset Size Name Purpose +───────────────────────────────────────────── +0x009000 24 KB nvs WiFi creds, TG token, API key, model +0x00F000 8 KB otadata OTA boot state +0x011000 4 KB phy_init WiFi PHY calibration +0x020000 2 MB ota_0 Firmware slot A +0x220000 2 MB ota_1 Firmware slot B +0x420000 12 MB spiffs Markdown memory, sessions, config +0xFF0000 64 KB coredump Crash dump storage +``` + +Total: 16 MB flash. + +--- + +## Storage Layout (SPIFFS) + +SPIFFS is a flat filesystem — no real directories. Files use path-like names. + +``` +/spiffs/config/SOUL.md AI personality definition +/spiffs/config/USER.md User profile +/spiffs/memory/MEMORY.md Long-term persistent memory +/spiffs/memory/2026-02-05.md Daily notes (one file per day) +/spiffs/sessions/tg_12345.jsonl Session history (one file per Telegram chat) +``` + +Session files are JSONL (one JSON object per line): +```json +{"role":"user","content":"Hello","ts":1738764800} +{"role":"assistant","content":"Hi there!","ts":1738764802} +``` + +--- + +## NVS Configuration + +| Namespace | Key | Description | +|---------------|--------------|-----------------------------------------| +| `wifi_config` | `ssid` | WiFi SSID | +| `wifi_config` | `password` | WiFi password | +| `tg_config` | `bot_token` | Telegram Bot API token | +| `llm_config` | `api_key` | Anthropic API key | +| `llm_config` | `model` | Model ID (default: claude-opus-4-5-20251101) | + +All configured via Serial CLI commands: `wifi_set`, `set_tg_token`, `set_api_key`, `set_model`. + +--- + +## Message Bus Protocol + +The internal message bus uses two FreeRTOS queues carrying `mimi_msg_t`: + +```c +typedef struct { + char channel[16]; // "telegram", "websocket", "cli" + char chat_id[32]; // Telegram chat ID or WS client ID + char *content; // Heap-allocated text (ownership transferred) +} mimi_msg_t; +``` + +- **Inbound queue**: channels → agent loop (depth: 8) +- **Outbound queue**: agent loop → dispatch → channels (depth: 8) +- Content string ownership is transferred on push; receiver must `free()`. + +--- + +## WebSocket Protocol + +Port: **18789**. Max clients: **4**. + +**Client → Server:** +```json +{"type": "message", "content": "Hello", "chat_id": "ws_client1"} +``` + +**Server → Client:** +```json +{"type": "response", "content": "Hi there!", "chat_id": "ws_client1"} +``` + +Client `chat_id` is auto-assigned on connection (`ws_`) but can be overridden in the first message. + +--- + +## Claude API Integration + +Endpoint: `POST https://api.anthropic.com/v1/messages` + +Request format (Anthropic-native, not OpenAI): +```json +{ + "model": "claude-opus-4-5-20251101", + "max_tokens": 4096, + "stream": true, + "system": "", + "messages": [ + {"role": "user", "content": "Hello"}, + {"role": "assistant", "content": "Hi!"}, + {"role": "user", "content": "How are you?"} + ] +} +``` + +Key difference from OpenAI: `system` is a top-level field, not inside the `messages` array. + +SSE streaming response events: +``` +event: content_block_delta +data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}} + +event: message_stop +data: {"type":"message_stop"} +``` + +The SSE parser in `llm_proxy.c` accumulates `text_delta` tokens into a response buffer. + +--- + +## Startup Sequence + +``` +app_main() + ├── init_nvs() NVS flash init (erase if corrupted) + ├── esp_event_loop_create_default() + ├── init_spiffs() Mount SPIFFS at /spiffs + ├── message_bus_init() Create inbound + outbound queues + ├── memory_store_init() Verify SPIFFS paths + ├── session_mgr_init() + ├── wifi_manager_init() Init WiFi STA mode + event handlers + ├── telegram_bot_init() Load bot token from NVS + ├── llm_proxy_init() Load API key + model from NVS + ├── agent_loop_init() + ├── serial_cli_init() Start REPL (works without WiFi) + │ + ├── wifi_manager_start() Connect using NVS credentials + │ └── wifi_manager_wait_connected(30s) + │ + └── [if WiFi connected] + ├── telegram_bot_start() Launch tg_poll task (Core 0) + ├── agent_loop_start() Launch agent_loop task (Core 1) + ├── ws_server_start() Start httpd on port 18789 + └── outbound_dispatch task Launch outbound task (Core 0) +``` + +If WiFi credentials are missing or connection times out, the CLI remains available for configuration. + +--- + +## Serial CLI Commands + +| Command | Description | +|--------------------------------|--------------------------------------| +| `wifi_set ` | Save WiFi credentials to NVS | +| `wifi_status` | Show connection status and IP | +| `set_tg_token ` | Save Telegram bot token | +| `set_api_key ` | Save Anthropic API key | +| `set_model ` | Set LLM model identifier | +| `memory_read` | Print MEMORY.md contents | +| `memory_write ` | Overwrite MEMORY.md | +| `session_list` | List all session files | +| `session_clear ` | Delete a session file | +| `heap_info` | Show internal + PSRAM free bytes | +| `restart` | Reboot the device | +| `help` | List all available commands | + +--- + +## Nanobot Reference Mapping + +| Nanobot Module | MimiClaw Equivalent | Notes | +|-----------------------------|--------------------------------|------------------------------| +| `agent/loop.py` | `agent/agent_loop.c` | Simplified: no tool use loop | +| `agent/context.py` | `agent/context_builder.c` | Loads SOUL.md + USER.md + memory | +| `agent/memory.py` | `memory/memory_store.c` | MEMORY.md + daily notes | +| `session/manager.py` | `memory/session_mgr.c` | JSONL per chat, ring buffer | +| `channels/telegram.py` | `telegram/telegram_bot.c` | Raw HTTP, no python-telegram-bot | +| `bus/events.py` + `queue.py`| `bus/message_bus.c` | FreeRTOS queues vs asyncio | +| `providers/litellm_provider.py` | `llm/llm_proxy.c` | Direct Anthropic API only | +| `config/schema.py` | `mimi_config.h` + NVS | Compile-time + NVS storage | +| `cli/commands.py` | `cli/serial_cli.c` | esp_console REPL | +| `agent/tools/*` | *(not yet implemented)* | See TODO.md | +| `agent/subagent.py` | *(not yet implemented)* | See TODO.md | +| `agent/skills.py` | *(not yet implemented)* | See TODO.md | +| `cron/service.py` | *(not yet implemented)* | See TODO.md | +| `heartbeat/service.py` | *(not yet implemented)* | See TODO.md | diff --git a/docs/TODO.md b/docs/TODO.md new file mode 100644 index 0000000..611064b --- /dev/null +++ b/docs/TODO.md @@ -0,0 +1,175 @@ +# MimiClaw vs Nanobot — Feature Gap Tracker + +> Comparing against `nanobot/` reference implementation. Tracks features MimiClaw has not yet aligned with. +> Priority: P0 = Core missing, P1 = Important enhancement, P2 = Nice to have + +--- + +## P0 — Core Agent Capabilities + +### [ ] Tool Use Loop (multi-turn agent iteration) +- **nanobot**: `loop.py` L167-210 — while loop calls LLM, checks `response.has_tool_calls`, executes tools, feeds results back into messages, repeats until LLM stops calling tools (max 20 iterations) +- **MimiClaw**: `agent_loop.c` only makes a single LLM call (one-shot), cannot use any tools +- **Scope**: Need to parse Anthropic API `tool_use` content blocks, implement tool execution loop +- **Note**: Anthropic tool_use format differs from OpenAI — uses content blocks, not function_call + +### [ ] Tool Registry + Built-in Tools +- **nanobot**: `tools/registry.py` — dynamic tool registration/execution, `tools/base.py` defines abstract Tool base class +- **nanobot built-in tools**: + - `read_file` — read files (`tools/filesystem.py`) + - `write_file` — write files + - `edit_file` — edit files + - `list_dir` — list directory + - `exec` — execute shell commands (`tools/shell.py`) + - `web_search` — web search (`tools/web.py`) + - `web_fetch` — fetch web pages + - `message` — send message to user (`tools/message.py`) + - `spawn` — launch subagent (`tools/spawn.py`) +- **MimiClaw**: No tool system at all +- **Recommendation**: Reasonable tool subset for ESP32: `read_file`, `write_file`, `list_dir` (SPIFFS), `message`. Shell/web not suitable for MCU + +### [ ] Subagent / Spawn Background Tasks +- **nanobot**: `subagent.py` — SubagentManager spawns independent agent instances with isolated tool sets and system prompts, announces results back to main agent via system channel +- **MimiClaw**: Not implemented +- **Recommendation**: ESP32 memory is limited; simplify to a single background FreeRTOS task for long-running work, inject result into inbound queue on completion + +--- + +## P1 — Important Features + +### [ ] Telegram User Allowlist (allow_from) +- **nanobot**: `channels/base.py` L59-82 — `is_allowed()` checks sender_id against allow_list +- **MimiClaw**: No authentication; anyone can message the bot and consume API credits +- **Recommendation**: Store allow_from list in NVS, filter in `process_updates()` + +### [ ] Telegram Markdown to HTML Conversion +- **nanobot**: `channels/telegram.py` L16-76 — `_markdown_to_telegram_html()` full converter: code blocks, inline code, bold, italic, links, strikethrough, lists +- **MimiClaw**: Uses `parse_mode: Markdown` directly; special characters can cause send failures (has fallback to plain text) +- **Recommendation**: Implement simplified Markdown-to-HTML converter, or switch to `parse_mode: HTML` + +### [ ] Telegram /start Command +- **nanobot**: `telegram.py` L183-192 — handles `/start` command, replies with welcome message +- **MimiClaw**: Not handled; /start is sent to Claude as a regular message + +### [ ] Telegram Media Handling (photos/voice/files) +- **nanobot**: `telegram.py` L194-289 — handles photo, voice, audio, document; downloads files; transcribes voice +- **MimiClaw**: Only processes `message.text`, ignores all media messages +- **Recommendation**: Images can be base64-encoded for Claude Vision; voice requires Whisper API (extra HTTPS request) + +### [ ] Skills System (pluggable capabilities) +- **nanobot**: `agent/skills.py` — loads skills from SKILL.md files, supports always-loaded and on-demand, frontmatter metadata, requirements checking +- **MimiClaw**: Not implemented +- **Recommendation**: Simplified version: store SKILL.md files on SPIFFS, load into system prompt via context_builder + +### [ ] Full Bootstrap File Alignment +- **nanobot**: Loads `AGENTS.md`, `SOUL.md`, `USER.md`, `TOOLS.md`, `IDENTITY.md` (5 files) +- **MimiClaw**: Only loads `SOUL.md` and `USER.md` +- **Recommendation**: Add AGENTS.md (behavior guidelines) and TOOLS.md (tool documentation) + +### [ ] Longer Memory Lookback +- **nanobot**: `memory.py` L56-80 — `get_recent_memories(days=7)` defaults to 7 days +- **MimiClaw**: `context_builder.c` only reads last 3 days +- **Recommendation**: Make configurable, but mind token budget + +### [ ] System Prompt Tool Guidance +- **nanobot**: `context.py` L74-101 — includes current time, workspace path, tool usage instructions +- **MimiClaw**: Has current time, but lacks tool usage guide and workspace description +- **Depends on**: Tool Use implementation + +### [ ] Message Metadata (media, reply_to, metadata) +- **nanobot**: `bus/events.py` — InboundMessage has media, metadata fields; OutboundMessage has reply_to +- **MimiClaw**: `mimi_msg_t` only has channel + chat_id + content +- **Recommendation**: Extend msg struct, add media_path and metadata fields + +### [ ] Outbound Subscription Pattern +- **nanobot**: `bus/queue.py` L41-49 — supports `subscribe_outbound(channel, callback)` subscription model +- **MimiClaw**: Hardcoded if-else dispatch +- **Recommendation**: Current approach is simple and reliable; not worth changing with few channels + +--- + +## P2 — Advanced Features + +### [ ] Cron Scheduled Task Service +- **nanobot**: `cron/service.py` — full cron scheduler supporting at/every/cron expressions, persistent storage, timed agent triggers +- **MimiClaw**: Not implemented +- **Recommendation**: Use FreeRTOS timer for simplified version, support "every N minutes" only + +### [ ] Heartbeat Service +- **nanobot**: `heartbeat/service.py` — reads HEARTBEAT.md every 30 minutes, triggers agent if tasks are found +- **MimiClaw**: Not implemented +- **Recommendation**: Simple FreeRTOS timer that periodically checks HEARTBEAT.md + +### [ ] Multi-LLM Provider Support +- **nanobot**: `providers/litellm_provider.py` — supports OpenRouter, Anthropic, OpenAI, Gemini, DeepSeek, Groq, Zhipu, vLLM via LiteLLM +- **MimiClaw**: Hardcoded to Anthropic Messages API +- **Recommendation**: Abstract LLM interface, support OpenAI-compatible API (most providers are compatible) + +### [ ] Voice Transcription +- **nanobot**: `providers/transcription.py` — Groq Whisper API +- **MimiClaw**: Not implemented +- **Recommendation**: Requires extra HTTPS request to Whisper API: download Telegram voice -> forward -> get text + +### [ ] YAML Config File System +- **nanobot**: `config/loader.py` + `config/schema.py` — Pydantic config validation, YAML config support +- **MimiClaw**: All configuration via NVS key-value storage +- **Recommendation**: Current NVS approach is suitable for MCU, no change needed + +### [ ] WebSocket Gateway Protocol Enhancement +- **nanobot**: Gateway port 18790 + richer protocol +- **MimiClaw**: Basic JSON protocol, lacks streaming token push +- **Recommendation**: Add `{"type":"token","content":"..."}` streaming push + +### [ ] Multi-Channel Manager +- **nanobot**: `channels/manager.py` — unified lifecycle management for multiple channels +- **MimiClaw**: Hardcoded in app_main() +- **Recommendation**: Not worth abstracting with few channels + +### [ ] WhatsApp / Feishu Channels +- **nanobot**: `channels/whatsapp.py`, `channels/feishu.py` +- **MimiClaw**: Only Telegram + WebSocket +- **Recommendation**: Low priority, Telegram is sufficient + +### [ ] Telegram Proxy Support (HTTP/SOCKS5) +- **nanobot**: `config/schema.py` L20 — TelegramConfig supports proxy field +- **MimiClaw**: No proxy support +- **Recommendation**: esp_http_client supports proxy, configurable via NVS + +### [ ] Session Metadata Persistence +- **nanobot**: `session/manager.py` L136-153 — session file includes metadata line (created_at, updated_at) +- **MimiClaw**: JSONL only stores role/content/ts, no metadata header +- **Recommendation**: Low priority + +--- + +## Completed Alignment + +- [x] Telegram Bot long polling (getUpdates) +- [x] Message Bus (inbound/outbound queues) +- [x] Agent Loop basic flow (single LLM call) +- [x] Claude API (Anthropic Messages API + SSE streaming) +- [x] Context Builder (system prompt + bootstrap files + memory) +- [x] Memory Store (MEMORY.md + daily notes) +- [x] Session Manager (JSONL per chat_id, ring buffer history) +- [x] WebSocket Gateway (port 18789, JSON protocol) +- [x] Serial CLI (esp_console, 12 commands) +- [x] OTA Update +- [x] WiFi Manager (NVS credentials, exponential backoff) +- [x] SPIFFS storage +- [x] NVS configuration (token, API key, model) + +--- + +## Suggested Implementation Order + +``` +1. Tool Use Loop + Tool Registry <- this determines whether the agent is truly "intelligent" +2. Built-in Tools (read_file, write_file, message) +3. Telegram Allowlist (allow_from) <- security essential +4. Bootstrap File Completion (AGENTS.md, TOOLS.md) +5. Subagent (simplified) +6. Telegram Markdown -> HTML +7. Media Handling +8. Cron / Heartbeat +9. Other enhancements +```