Files

crispyberry 9b8121c4ce docs: add Chinese README, update proxy commands and model name

- Add README_CN.md with proxy setup guide for users in China
- Add language switcher to both READMEs
- Add set_proxy/clear_proxy to More Commands section
- Update default model to claude-opus-4-6
- Add Nanobot to acknowledgments
- Update ARCHITECTURE.md: proxy module, stack sizes, NVS config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-06 23:01:05 +08:00

16 KiB

Raw Blame History

MimiClaw Architecture

ESP32-S3 AI Agent firmware — C/FreeRTOS implementation running on bare metal (no Linux).

System Overview

Telegram App (User)
    │
    │  HTTPS Long Polling
    │
    ▼
┌──────────────────────────────────────────────────┐
│               ESP32-S3 (MimiClaw)                │
│                                                  │
│   ┌─────────────┐       ┌──────────────────┐     │
│   │  Telegram    │──────▶│   Inbound Queue  │     │
│   │  Poller      │       └────────┬─────────┘     │
│   │  (Core 0)    │               │                │
│   └─────────────┘               ▼                │
│                          ┌──────────────┐         │
│   ┌─────────────┐       │  Agent Loop   │         │
│   │  WebSocket   │──────▶│  (Core 1)    │         │
│   │  Server      │       │              │         │
│   │  (:18789)    │       │  Context ──▶ LLM Proxy │
│   └─────────────┘       │  Builder     (HTTPS)   │
│                          └──────┬───────┘         │
│   ┌─────────────┐               │                │
│   │  Serial CLI  │               ▼                │
│   │  (Core 0)    │       ┌──────────────┐         │
│   └─────────────┘       │ Outbound Queue│         │
│                          └──────┬───────┘         │
│                                 │                 │
│                          ┌──────▼───────┐         │
│                          │  Outbound    │         │
│                          │  Dispatch    │         │
│                          │  (Core 0)    │         │
│                          └──┬────────┬──┘         │
│                             │        │            │
│                      Telegram    WebSocket        │
│                      sendMessage  send            │
│                                                  │
│   ┌──────────────────────────────────────────┐   │
│   │  SPIFFS (12 MB)                          │   │
│   │  /spiffs/config/  SOUL.md, USER.md       │   │
│   │  /spiffs/memory/  MEMORY.md, YYYY-MM-DD  │   │
│   │  /spiffs/sessions/ tg_<chat_id>.jsonl    │   │
│   └──────────────────────────────────────────┘   │
└──────────────────────────────────────────────────┘
         │
         │  Anthropic Messages API (HTTPS + SSE)
         ▼
   ┌───────────┐
   │ Claude API │
   └───────────┘

Data Flow

1. User sends message on Telegram (or WebSocket)
2. Channel poller receives message, wraps in mimi_msg_t
3. Message pushed to Inbound Queue (FreeRTOS xQueue)
4. Agent Loop (Core 1) pops message:
   a. Load session history from SPIFFS (JSONL)
   b. Build system prompt (SOUL.md + USER.md + MEMORY.md + recent notes)
   c. Build messages array (history + current message)
   d. Call Claude API via HTTPS (SSE streaming)
   e. Accumulate streamed response tokens
   f. Save user + assistant messages to session file
   g. Push response to Outbound Queue
5. Outbound Dispatch (Core 0) pops response:
   a. Route by channel field ("telegram" → sendMessage, "websocket" → WS frame)
6. User receives reply

Module Map

main/
├── mimi.c                  Entry point — app_main() orchestrates init + startup
├── mimi_config.h           All compile-time constants in one place
│
├── bus/
│   ├── message_bus.h       mimi_msg_t struct, queue API
│   └── message_bus.c       Two FreeRTOS queues: inbound + outbound
│
├── wifi/
│   ├── wifi_manager.h      WiFi STA lifecycle API
│   └── wifi_manager.c      NVS credentials, event handler, exponential backoff
│
├── telegram/
│   ├── telegram_bot.h      Bot init/start, send_message API
│   └── telegram_bot.c      Long polling loop, JSON parsing, message splitting
│
├── llm/
│   ├── llm_proxy.h         llm_chat() API
│   └── llm_proxy.c         Anthropic Messages API, SSE stream parser
│
├── agent/
│   ├── agent_loop.h        Agent task init/start
│   ├── agent_loop.c        Main processing loop: inbound → context → LLM → outbound
│   ├── context_builder.h   System prompt + messages builder API
│   └── context_builder.c   Reads bootstrap files + memory, assembles prompt
│
├── memory/
│   ├── memory_store.h      Long-term + daily memory API
│   ├── memory_store.c      MEMORY.md read/write, daily .md append/read
│   ├── session_mgr.h       Per-chat session API
│   └── session_mgr.c       JSONL session files, ring buffer history
│
├── gateway/
│   ├── ws_server.h         WebSocket server API
│   └── ws_server.c         ESP HTTP server with WS upgrade, client tracking
│
├── proxy/
│   ├── http_proxy.h        Proxy connection API
│   └── http_proxy.c        HTTP CONNECT tunnel + TLS via esp_tls
│
├── cli/
│   ├── serial_cli.h        CLI init API
│   └── serial_cli.c        esp_console REPL with 14 commands
│
└── ota/
    ├── ota_manager.h       OTA update API
    └── ota_manager.c       esp_https_ota wrapper

FreeRTOS Task Layout

Task	Core	Priority	Stack	Description
`tg_poll`	0	5	12 KB	Telegram long polling (30s timeout)
`agent_loop`	1	6	12 KB	Message processing + Claude API call
`outbound`	0	5	8 KB	Route responses to Telegram / WS
`serial_cli`	0	3	4 KB	USB serial console REPL
httpd (internal)	0	5	—	WebSocket server (esp_http_server)
wifi_event (IDF)	0	8	—	WiFi event handling (ESP-IDF)

Core allocation strategy: Core 0 handles I/O (network, serial, WiFi). Core 1 is dedicated to the agent loop (CPU-bound JSON building + waiting on HTTPS).

Memory Budget

Purpose	Location	Size
FreeRTOS task stacks	Internal SRAM	~40 KB
WiFi buffers	Internal SRAM	~30 KB
TLS connections x2 (Telegram + Claude)	PSRAM	~120 KB
JSON parse buffers	PSRAM	~32 KB
Session history cache	PSRAM	~32 KB
System prompt buffer	PSRAM	~16 KB
LLM response stream buffer	PSRAM	~32 KB
Remaining available	PSRAM	~7.7 MB

Large buffers (32 KB+) are allocated from PSRAM via heap_caps_calloc(1, size, MALLOC_CAP_SPIRAM).

Flash Partition Layout

Offset      Size      Name        Purpose
─────────────────────────────────────────────
0x009000    24 KB     nvs         WiFi creds, TG token, API key, model
0x00F000     8 KB     otadata     OTA boot state
0x011000     4 KB     phy_init    WiFi PHY calibration
0x020000     2 MB     ota_0       Firmware slot A
0x220000     2 MB     ota_1       Firmware slot B
0x420000    12 MB     spiffs      Markdown memory, sessions, config
0xFF0000    64 KB     coredump    Crash dump storage

Total: 16 MB flash.

Storage Layout (SPIFFS)

SPIFFS is a flat filesystem — no real directories. Files use path-like names.

/spiffs/config/SOUL.md          AI personality definition
/spiffs/config/USER.md          User profile
/spiffs/memory/MEMORY.md        Long-term persistent memory
/spiffs/memory/2026-02-05.md    Daily notes (one file per day)
/spiffs/sessions/tg_12345.jsonl Session history (one file per Telegram chat)

Session files are JSONL (one JSON object per line):

{"role":"user","content":"Hello","ts":1738764800}
{"role":"assistant","content":"Hi there!","ts":1738764802}

NVS Configuration

Namespace	Key	Description
`wifi_config`	`ssid`	WiFi SSID
`wifi_config`	`password`	WiFi password
`tg_config`	`bot_token`	Telegram Bot API token
`llm_config`	`api_key`	Anthropic API key
`llm_config`	`model`	Model ID (default: claude-opus-4-6)
`proxy_config`	`host`	HTTP proxy hostname/IP
`proxy_config`	`port`	HTTP proxy port

All configured via Serial CLI commands: wifi_set, set_tg_token, set_api_key, set_model, set_proxy, clear_proxy.

Message Bus Protocol

The internal message bus uses two FreeRTOS queues carrying mimi_msg_t:

typedef struct {
    char channel[16];   // "telegram", "websocket", "cli"
    char chat_id[32];   // Telegram chat ID or WS client ID
    char *content;      // Heap-allocated text (ownership transferred)
} mimi_msg_t;

Inbound queue: channels → agent loop (depth: 8)
Outbound queue: agent loop → dispatch → channels (depth: 8)
Content string ownership is transferred on push; receiver must free().

WebSocket Protocol

Port: 18789. Max clients: 4.

Client → Server:

{"type": "message", "content": "Hello", "chat_id": "ws_client1"}

Server → Client:

{"type": "response", "content": "Hi there!", "chat_id": "ws_client1"}

Client chat_id is auto-assigned on connection (ws_<fd>) but can be overridden in the first message.

Claude API Integration

Endpoint: POST https://api.anthropic.com/v1/messages

Request format (Anthropic-native, not OpenAI):

{
  "model": "claude-opus-4-6",
  "max_tokens": 4096,
  "stream": true,
  "system": "<system prompt>",
  "messages": [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi!"},
    {"role": "user", "content": "How are you?"}
  ]
}

Key difference from OpenAI: system is a top-level field, not inside the messages array.

SSE streaming response events:

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}

event: message_stop
data: {"type":"message_stop"}

The SSE parser in llm_proxy.c accumulates text_delta tokens into a response buffer.

Startup Sequence

app_main()
  ├── init_nvs()                    NVS flash init (erase if corrupted)
  ├── esp_event_loop_create_default()
  ├── init_spiffs()                 Mount SPIFFS at /spiffs
  ├── message_bus_init()            Create inbound + outbound queues
  ├── memory_store_init()           Verify SPIFFS paths
  ├── session_mgr_init()
  ├── wifi_manager_init()           Init WiFi STA mode + event handlers
  ├── http_proxy_init()             Load proxy config from NVS
  ├── telegram_bot_init()           Load bot token from NVS
  ├── llm_proxy_init()              Load API key + model from NVS
  ├── agent_loop_init()
  ├── serial_cli_init()             Start REPL (works without WiFi)
  │
  ├── wifi_manager_start()          Connect using NVS credentials
  │   └── wifi_manager_wait_connected(30s)
  │
  └── [if WiFi connected]
      ├── telegram_bot_start()      Launch tg_poll task (Core 0)
      ├── agent_loop_start()        Launch agent_loop task (Core 1)
      ├── ws_server_start()         Start httpd on port 18789
      └── outbound_dispatch task    Launch outbound task (Core 0)

If WiFi credentials are missing or connection times out, the CLI remains available for configuration.

Serial CLI Commands

Command	Description
`wifi_set <SSID> <PASSWORD>`	Save WiFi credentials to NVS
`wifi_status`	Show connection status and IP
`set_tg_token <TOKEN>`	Save Telegram bot token
`set_api_key <KEY>`	Save Anthropic API key
`set_model <MODEL_ID>`	Set LLM model identifier
`set_proxy <HOST> <PORT>`	Set HTTP CONNECT proxy
`clear_proxy`	Remove proxy, use direct connection
`memory_read`	Print MEMORY.md contents
`memory_write <CONTENT>`	Overwrite MEMORY.md
`session_list`	List all session files
`session_clear <CHAT_ID>`	Delete a session file
`heap_info`	Show internal + PSRAM free bytes
`restart`	Reboot the device
`help`	List all available commands

Nanobot Reference Mapping

Nanobot Module	MimiClaw Equivalent	Notes
`agent/loop.py`	`agent/agent_loop.c`	Simplified: no tool use loop
`agent/context.py`	`agent/context_builder.c`	Loads SOUL.md + USER.md + memory
`agent/memory.py`	`memory/memory_store.c`	MEMORY.md + daily notes
`session/manager.py`	`memory/session_mgr.c`	JSONL per chat, ring buffer
`channels/telegram.py`	`telegram/telegram_bot.c`	Raw HTTP, no python-telegram-bot
`bus/events.py` + `queue.py`	`bus/message_bus.c`	FreeRTOS queues vs asyncio
`providers/litellm_provider.py`	`llm/llm_proxy.c`	Direct Anthropic API only
`config/schema.py`	`mimi_config.h` + NVS	Compile-time + NVS storage
`cli/commands.py`	`cli/serial_cli.c`	esp_console REPL
`agent/tools/*`	(not yet implemented)	See TODO.md
`agent/subagent.py`	(not yet implemented)	See TODO.md
`agent/skills.py`	(not yet implemented)	See TODO.md
`cron/service.py`	(not yet implemented)	See TODO.md
`heartbeat/service.py`	(not yet implemented)	See TODO.md

16 KiB Raw Blame History