diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..b8668d1 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,353 @@ +# MimiClaw Architecture + +> ESP32-S3 AI Agent firmware — a C/FreeRTOS reimplementation of [nanobot](../nanobot/)'s core agent capabilities. + +--- + +## System Overview + +``` +Telegram App (User) + │ + │ HTTPS Long Polling + │ + ▼ +┌──────────────────────────────────────────────────┐ +│ ESP32-S3 (MimiClaw) │ +│ │ +│ ┌─────────────┐ ┌──────────────────┐ │ +│ │ Telegram │──────▶│ Inbound Queue │ │ +│ │ Poller │ └────────┬─────────┘ │ +│ │ (Core 0) │ │ │ +│ └─────────────┘ ▼ │ +│ ┌──────────────┐ │ +│ ┌─────────────┐ │ Agent Loop │ │ +│ │ WebSocket │──────▶│ (Core 1) │ │ +│ │ Server │ │ │ │ +│ │ (:18789) │ │ Context ──▶ LLM Proxy │ +│ └─────────────┘ │ Builder (HTTPS) │ +│ └──────┬───────┘ │ +│ ┌─────────────┐ │ │ +│ │ Serial CLI │ ▼ │ +│ │ (Core 0) │ ┌──────────────┐ │ +│ └─────────────┘ │ Outbound Queue│ │ +│ └──────┬───────┘ │ +│ │ │ +│ ┌──────▼───────┐ │ +│ │ Outbound │ │ +│ │ Dispatch │ │ +│ │ (Core 0) │ │ +│ └──┬────────┬──┘ │ +│ │ │ │ +│ Telegram WebSocket │ +│ sendMessage send │ +│ │ +│ ┌──────────────────────────────────────────┐ │ +│ │ SPIFFS (12 MB) │ │ +│ │ /spiffs/config/ SOUL.md, USER.md │ │ +│ │ /spiffs/memory/ MEMORY.md, YYYY-MM-DD │ │ +│ │ /spiffs/sessions/ tg_.jsonl │ │ +│ └──────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────┘ + │ + │ Anthropic Messages API (HTTPS + SSE) + ▼ + ┌───────────┐ + │ Claude API │ + └───────────┘ +``` + +--- + +## Data Flow + +``` +1. User sends message on Telegram (or WebSocket) +2. Channel poller receives message, wraps in mimi_msg_t +3. Message pushed to Inbound Queue (FreeRTOS xQueue) +4. Agent Loop (Core 1) pops message: + a. Load session history from SPIFFS (JSONL) + b. Build system prompt (SOUL.md + USER.md + MEMORY.md + recent notes) + c. Build messages array (history + current message) + d. Call Claude API via HTTPS (SSE streaming) + e. Accumulate streamed response tokens + f. Save user + assistant messages to session file + g. Push response to Outbound Queue +5. Outbound Dispatch (Core 0) pops response: + a. Route by channel field ("telegram" → sendMessage, "websocket" → WS frame) +6. User receives reply +``` + +--- + +## Module Map + +``` +main/ +├── mimi.c Entry point — app_main() orchestrates init + startup +├── mimi_config.h All compile-time constants in one place +│ +├── bus/ +│ ├── message_bus.h mimi_msg_t struct, queue API +│ └── message_bus.c Two FreeRTOS queues: inbound + outbound +│ +├── wifi/ +│ ├── wifi_manager.h WiFi STA lifecycle API +│ └── wifi_manager.c NVS credentials, event handler, exponential backoff +│ +├── telegram/ +│ ├── telegram_bot.h Bot init/start, send_message API +│ └── telegram_bot.c Long polling loop, JSON parsing, message splitting +│ +├── llm/ +│ ├── llm_proxy.h llm_chat() API +│ └── llm_proxy.c Anthropic Messages API, SSE stream parser +│ +├── agent/ +│ ├── agent_loop.h Agent task init/start +│ ├── agent_loop.c Main processing loop: inbound → context → LLM → outbound +│ ├── context_builder.h System prompt + messages builder API +│ └── context_builder.c Reads bootstrap files + memory, assembles prompt +│ +├── memory/ +│ ├── memory_store.h Long-term + daily memory API +│ ├── memory_store.c MEMORY.md read/write, daily .md append/read +│ ├── session_mgr.h Per-chat session API +│ └── session_mgr.c JSONL session files, ring buffer history +│ +├── gateway/ +│ ├── ws_server.h WebSocket server API +│ └── ws_server.c ESP HTTP server with WS upgrade, client tracking +│ +├── cli/ +│ ├── serial_cli.h CLI init API +│ └── serial_cli.c esp_console REPL with 12 commands +│ +└── ota/ + ├── ota_manager.h OTA update API + └── ota_manager.c esp_https_ota wrapper +``` + +--- + +## FreeRTOS Task Layout + +| Task | Core | Priority | Stack | Description | +|--------------------|------|----------|--------|--------------------------------------| +| `tg_poll` | 0 | 5 | 8 KB | Telegram long polling (30s timeout) | +| `agent_loop` | 1 | 6 | 8 KB | Message processing + Claude API call | +| `outbound` | 0 | 5 | 4 KB | Route responses to Telegram / WS | +| `serial_cli` | 0 | 3 | 4 KB | USB serial console REPL | +| httpd (internal) | 0 | 5 | — | WebSocket server (esp_http_server) | +| wifi_event (IDF) | 0 | 8 | — | WiFi event handling (ESP-IDF) | + +**Core allocation strategy**: Core 0 handles I/O (network, serial, WiFi). Core 1 is dedicated to the agent loop (CPU-bound JSON building + waiting on HTTPS). + +--- + +## Memory Budget + +| Purpose | Location | Size | +|------------------------------------|----------------|----------| +| FreeRTOS task stacks | Internal SRAM | ~40 KB | +| WiFi buffers | Internal SRAM | ~30 KB | +| TLS connections x2 (Telegram + Claude) | PSRAM | ~120 KB | +| JSON parse buffers | PSRAM | ~32 KB | +| Session history cache | PSRAM | ~32 KB | +| System prompt buffer | PSRAM | ~16 KB | +| LLM response stream buffer | PSRAM | ~32 KB | +| Remaining available | PSRAM | ~7.7 MB | + +Large buffers (32 KB+) are allocated from PSRAM via `heap_caps_calloc(1, size, MALLOC_CAP_SPIRAM)`. + +--- + +## Flash Partition Layout + +``` +Offset Size Name Purpose +───────────────────────────────────────────── +0x009000 24 KB nvs WiFi creds, TG token, API key, model +0x00F000 8 KB otadata OTA boot state +0x011000 4 KB phy_init WiFi PHY calibration +0x020000 2 MB ota_0 Firmware slot A +0x220000 2 MB ota_1 Firmware slot B +0x420000 12 MB spiffs Markdown memory, sessions, config +0xFF0000 64 KB coredump Crash dump storage +``` + +Total: 16 MB flash. + +--- + +## Storage Layout (SPIFFS) + +SPIFFS is a flat filesystem — no real directories. Files use path-like names. + +``` +/spiffs/config/SOUL.md AI personality definition +/spiffs/config/USER.md User profile +/spiffs/memory/MEMORY.md Long-term persistent memory +/spiffs/memory/2026-02-05.md Daily notes (one file per day) +/spiffs/sessions/tg_12345.jsonl Session history (one file per Telegram chat) +``` + +Session files are JSONL (one JSON object per line): +```json +{"role":"user","content":"Hello","ts":1738764800} +{"role":"assistant","content":"Hi there!","ts":1738764802} +``` + +--- + +## NVS Configuration + +| Namespace | Key | Description | +|---------------|--------------|-----------------------------------------| +| `wifi_config` | `ssid` | WiFi SSID | +| `wifi_config` | `password` | WiFi password | +| `tg_config` | `bot_token` | Telegram Bot API token | +| `llm_config` | `api_key` | Anthropic API key | +| `llm_config` | `model` | Model ID (default: claude-opus-4-5-20251101) | + +All configured via Serial CLI commands: `wifi_set`, `set_tg_token`, `set_api_key`, `set_model`. + +--- + +## Message Bus Protocol + +The internal message bus uses two FreeRTOS queues carrying `mimi_msg_t`: + +```c +typedef struct { + char channel[16]; // "telegram", "websocket", "cli" + char chat_id[32]; // Telegram chat ID or WS client ID + char *content; // Heap-allocated text (ownership transferred) +} mimi_msg_t; +``` + +- **Inbound queue**: channels → agent loop (depth: 8) +- **Outbound queue**: agent loop → dispatch → channels (depth: 8) +- Content string ownership is transferred on push; receiver must `free()`. + +--- + +## WebSocket Protocol + +Port: **18789**. Max clients: **4**. + +**Client → Server:** +```json +{"type": "message", "content": "Hello", "chat_id": "ws_client1"} +``` + +**Server → Client:** +```json +{"type": "response", "content": "Hi there!", "chat_id": "ws_client1"} +``` + +Client `chat_id` is auto-assigned on connection (`ws_`) but can be overridden in the first message. + +--- + +## Claude API Integration + +Endpoint: `POST https://api.anthropic.com/v1/messages` + +Request format (Anthropic-native, not OpenAI): +```json +{ + "model": "claude-opus-4-5-20251101", + "max_tokens": 4096, + "stream": true, + "system": "", + "messages": [ + {"role": "user", "content": "Hello"}, + {"role": "assistant", "content": "Hi!"}, + {"role": "user", "content": "How are you?"} + ] +} +``` + +Key difference from OpenAI: `system` is a top-level field, not inside the `messages` array. + +SSE streaming response events: +``` +event: content_block_delta +data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}} + +event: message_stop +data: {"type":"message_stop"} +``` + +The SSE parser in `llm_proxy.c` accumulates `text_delta` tokens into a response buffer. + +--- + +## Startup Sequence + +``` +app_main() + ├── init_nvs() NVS flash init (erase if corrupted) + ├── esp_event_loop_create_default() + ├── init_spiffs() Mount SPIFFS at /spiffs + ├── message_bus_init() Create inbound + outbound queues + ├── memory_store_init() Verify SPIFFS paths + ├── session_mgr_init() + ├── wifi_manager_init() Init WiFi STA mode + event handlers + ├── telegram_bot_init() Load bot token from NVS + ├── llm_proxy_init() Load API key + model from NVS + ├── agent_loop_init() + ├── serial_cli_init() Start REPL (works without WiFi) + │ + ├── wifi_manager_start() Connect using NVS credentials + │ └── wifi_manager_wait_connected(30s) + │ + └── [if WiFi connected] + ├── telegram_bot_start() Launch tg_poll task (Core 0) + ├── agent_loop_start() Launch agent_loop task (Core 1) + ├── ws_server_start() Start httpd on port 18789 + └── outbound_dispatch task Launch outbound task (Core 0) +``` + +If WiFi credentials are missing or connection times out, the CLI remains available for configuration. + +--- + +## Serial CLI Commands + +| Command | Description | +|--------------------------------|--------------------------------------| +| `wifi_set ` | Save WiFi credentials to NVS | +| `wifi_status` | Show connection status and IP | +| `set_tg_token ` | Save Telegram bot token | +| `set_api_key ` | Save Anthropic API key | +| `set_model ` | Set LLM model identifier | +| `memory_read` | Print MEMORY.md contents | +| `memory_write ` | Overwrite MEMORY.md | +| `session_list` | List all session files | +| `session_clear ` | Delete a session file | +| `heap_info` | Show internal + PSRAM free bytes | +| `restart` | Reboot the device | +| `help` | List all available commands | + +--- + +## Nanobot Reference Mapping + +| Nanobot Module | MimiClaw Equivalent | Notes | +|-----------------------------|--------------------------------|------------------------------| +| `agent/loop.py` | `agent/agent_loop.c` | Simplified: no tool use loop | +| `agent/context.py` | `agent/context_builder.c` | Loads SOUL.md + USER.md + memory | +| `agent/memory.py` | `memory/memory_store.c` | MEMORY.md + daily notes | +| `session/manager.py` | `memory/session_mgr.c` | JSONL per chat, ring buffer | +| `channels/telegram.py` | `telegram/telegram_bot.c` | Raw HTTP, no python-telegram-bot | +| `bus/events.py` + `queue.py`| `bus/message_bus.c` | FreeRTOS queues vs asyncio | +| `providers/litellm_provider.py` | `llm/llm_proxy.c` | Direct Anthropic API only | +| `config/schema.py` | `mimi_config.h` + NVS | Compile-time + NVS storage | +| `cli/commands.py` | `cli/serial_cli.c` | esp_console REPL | +| `agent/tools/*` | *(not yet implemented)* | See TODO.md | +| `agent/subagent.py` | *(not yet implemented)* | See TODO.md | +| `agent/skills.py` | *(not yet implemented)* | See TODO.md | +| `cron/service.py` | *(not yet implemented)* | See TODO.md | +| `heartbeat/service.py` | *(not yet implemented)* | See TODO.md | diff --git a/docs/TODO.md b/docs/TODO.md new file mode 100644 index 0000000..611064b --- /dev/null +++ b/docs/TODO.md @@ -0,0 +1,175 @@ +# MimiClaw vs Nanobot — Feature Gap Tracker + +> Comparing against `nanobot/` reference implementation. Tracks features MimiClaw has not yet aligned with. +> Priority: P0 = Core missing, P1 = Important enhancement, P2 = Nice to have + +--- + +## P0 — Core Agent Capabilities + +### [ ] Tool Use Loop (multi-turn agent iteration) +- **nanobot**: `loop.py` L167-210 — while loop calls LLM, checks `response.has_tool_calls`, executes tools, feeds results back into messages, repeats until LLM stops calling tools (max 20 iterations) +- **MimiClaw**: `agent_loop.c` only makes a single LLM call (one-shot), cannot use any tools +- **Scope**: Need to parse Anthropic API `tool_use` content blocks, implement tool execution loop +- **Note**: Anthropic tool_use format differs from OpenAI — uses content blocks, not function_call + +### [ ] Tool Registry + Built-in Tools +- **nanobot**: `tools/registry.py` — dynamic tool registration/execution, `tools/base.py` defines abstract Tool base class +- **nanobot built-in tools**: + - `read_file` — read files (`tools/filesystem.py`) + - `write_file` — write files + - `edit_file` — edit files + - `list_dir` — list directory + - `exec` — execute shell commands (`tools/shell.py`) + - `web_search` — web search (`tools/web.py`) + - `web_fetch` — fetch web pages + - `message` — send message to user (`tools/message.py`) + - `spawn` — launch subagent (`tools/spawn.py`) +- **MimiClaw**: No tool system at all +- **Recommendation**: Reasonable tool subset for ESP32: `read_file`, `write_file`, `list_dir` (SPIFFS), `message`. Shell/web not suitable for MCU + +### [ ] Subagent / Spawn Background Tasks +- **nanobot**: `subagent.py` — SubagentManager spawns independent agent instances with isolated tool sets and system prompts, announces results back to main agent via system channel +- **MimiClaw**: Not implemented +- **Recommendation**: ESP32 memory is limited; simplify to a single background FreeRTOS task for long-running work, inject result into inbound queue on completion + +--- + +## P1 — Important Features + +### [ ] Telegram User Allowlist (allow_from) +- **nanobot**: `channels/base.py` L59-82 — `is_allowed()` checks sender_id against allow_list +- **MimiClaw**: No authentication; anyone can message the bot and consume API credits +- **Recommendation**: Store allow_from list in NVS, filter in `process_updates()` + +### [ ] Telegram Markdown to HTML Conversion +- **nanobot**: `channels/telegram.py` L16-76 — `_markdown_to_telegram_html()` full converter: code blocks, inline code, bold, italic, links, strikethrough, lists +- **MimiClaw**: Uses `parse_mode: Markdown` directly; special characters can cause send failures (has fallback to plain text) +- **Recommendation**: Implement simplified Markdown-to-HTML converter, or switch to `parse_mode: HTML` + +### [ ] Telegram /start Command +- **nanobot**: `telegram.py` L183-192 — handles `/start` command, replies with welcome message +- **MimiClaw**: Not handled; /start is sent to Claude as a regular message + +### [ ] Telegram Media Handling (photos/voice/files) +- **nanobot**: `telegram.py` L194-289 — handles photo, voice, audio, document; downloads files; transcribes voice +- **MimiClaw**: Only processes `message.text`, ignores all media messages +- **Recommendation**: Images can be base64-encoded for Claude Vision; voice requires Whisper API (extra HTTPS request) + +### [ ] Skills System (pluggable capabilities) +- **nanobot**: `agent/skills.py` — loads skills from SKILL.md files, supports always-loaded and on-demand, frontmatter metadata, requirements checking +- **MimiClaw**: Not implemented +- **Recommendation**: Simplified version: store SKILL.md files on SPIFFS, load into system prompt via context_builder + +### [ ] Full Bootstrap File Alignment +- **nanobot**: Loads `AGENTS.md`, `SOUL.md`, `USER.md`, `TOOLS.md`, `IDENTITY.md` (5 files) +- **MimiClaw**: Only loads `SOUL.md` and `USER.md` +- **Recommendation**: Add AGENTS.md (behavior guidelines) and TOOLS.md (tool documentation) + +### [ ] Longer Memory Lookback +- **nanobot**: `memory.py` L56-80 — `get_recent_memories(days=7)` defaults to 7 days +- **MimiClaw**: `context_builder.c` only reads last 3 days +- **Recommendation**: Make configurable, but mind token budget + +### [ ] System Prompt Tool Guidance +- **nanobot**: `context.py` L74-101 — includes current time, workspace path, tool usage instructions +- **MimiClaw**: Has current time, but lacks tool usage guide and workspace description +- **Depends on**: Tool Use implementation + +### [ ] Message Metadata (media, reply_to, metadata) +- **nanobot**: `bus/events.py` — InboundMessage has media, metadata fields; OutboundMessage has reply_to +- **MimiClaw**: `mimi_msg_t` only has channel + chat_id + content +- **Recommendation**: Extend msg struct, add media_path and metadata fields + +### [ ] Outbound Subscription Pattern +- **nanobot**: `bus/queue.py` L41-49 — supports `subscribe_outbound(channel, callback)` subscription model +- **MimiClaw**: Hardcoded if-else dispatch +- **Recommendation**: Current approach is simple and reliable; not worth changing with few channels + +--- + +## P2 — Advanced Features + +### [ ] Cron Scheduled Task Service +- **nanobot**: `cron/service.py` — full cron scheduler supporting at/every/cron expressions, persistent storage, timed agent triggers +- **MimiClaw**: Not implemented +- **Recommendation**: Use FreeRTOS timer for simplified version, support "every N minutes" only + +### [ ] Heartbeat Service +- **nanobot**: `heartbeat/service.py` — reads HEARTBEAT.md every 30 minutes, triggers agent if tasks are found +- **MimiClaw**: Not implemented +- **Recommendation**: Simple FreeRTOS timer that periodically checks HEARTBEAT.md + +### [ ] Multi-LLM Provider Support +- **nanobot**: `providers/litellm_provider.py` — supports OpenRouter, Anthropic, OpenAI, Gemini, DeepSeek, Groq, Zhipu, vLLM via LiteLLM +- **MimiClaw**: Hardcoded to Anthropic Messages API +- **Recommendation**: Abstract LLM interface, support OpenAI-compatible API (most providers are compatible) + +### [ ] Voice Transcription +- **nanobot**: `providers/transcription.py` — Groq Whisper API +- **MimiClaw**: Not implemented +- **Recommendation**: Requires extra HTTPS request to Whisper API: download Telegram voice -> forward -> get text + +### [ ] YAML Config File System +- **nanobot**: `config/loader.py` + `config/schema.py` — Pydantic config validation, YAML config support +- **MimiClaw**: All configuration via NVS key-value storage +- **Recommendation**: Current NVS approach is suitable for MCU, no change needed + +### [ ] WebSocket Gateway Protocol Enhancement +- **nanobot**: Gateway port 18790 + richer protocol +- **MimiClaw**: Basic JSON protocol, lacks streaming token push +- **Recommendation**: Add `{"type":"token","content":"..."}` streaming push + +### [ ] Multi-Channel Manager +- **nanobot**: `channels/manager.py` — unified lifecycle management for multiple channels +- **MimiClaw**: Hardcoded in app_main() +- **Recommendation**: Not worth abstracting with few channels + +### [ ] WhatsApp / Feishu Channels +- **nanobot**: `channels/whatsapp.py`, `channels/feishu.py` +- **MimiClaw**: Only Telegram + WebSocket +- **Recommendation**: Low priority, Telegram is sufficient + +### [ ] Telegram Proxy Support (HTTP/SOCKS5) +- **nanobot**: `config/schema.py` L20 — TelegramConfig supports proxy field +- **MimiClaw**: No proxy support +- **Recommendation**: esp_http_client supports proxy, configurable via NVS + +### [ ] Session Metadata Persistence +- **nanobot**: `session/manager.py` L136-153 — session file includes metadata line (created_at, updated_at) +- **MimiClaw**: JSONL only stores role/content/ts, no metadata header +- **Recommendation**: Low priority + +--- + +## Completed Alignment + +- [x] Telegram Bot long polling (getUpdates) +- [x] Message Bus (inbound/outbound queues) +- [x] Agent Loop basic flow (single LLM call) +- [x] Claude API (Anthropic Messages API + SSE streaming) +- [x] Context Builder (system prompt + bootstrap files + memory) +- [x] Memory Store (MEMORY.md + daily notes) +- [x] Session Manager (JSONL per chat_id, ring buffer history) +- [x] WebSocket Gateway (port 18789, JSON protocol) +- [x] Serial CLI (esp_console, 12 commands) +- [x] OTA Update +- [x] WiFi Manager (NVS credentials, exponential backoff) +- [x] SPIFFS storage +- [x] NVS configuration (token, API key, model) + +--- + +## Suggested Implementation Order + +``` +1. Tool Use Loop + Tool Registry <- this determines whether the agent is truly "intelligent" +2. Built-in Tools (read_file, write_file, message) +3. Telegram Allowlist (allow_from) <- security essential +4. Bootstrap File Completion (AGENTS.md, TOOLS.md) +5. Subagent (simplified) +6. Telegram Markdown -> HTML +7. Media Handling +8. Cron / Heartbeat +9. Other enhancements +```