System Architecture
Two interfaces, one backend, one AI brain. No custom routing logic anywhere.
Interface
WhatsApp User
Phone number = identity. Auto-authenticated.
Interface
Web Browser
Session ID + auth token. Needs OTP.
--- POST /webhook (Twilio/Meta) --- --- POST /api/chat (JSON) ---
Backend (single file)
Express Server — chatbot/src/server.js
2,600 lines. converse() function handles both channels identically.
In-memory sessions (Map, 30-min timeout) • Message persistence • Image processing • Rate limiting
↓↓↓ anthropic.messages.create() with tools[] ↓↓↓
AI Brain
Claude Sonnet 4 — Agentic Tool-Use Loop
System prompt (spec.md ~7,500 words) + 18 tools. Claude decides what to do. Loop continues until text reply.
$3/M input tokens • $15/M output tokens • Every call logged to api_costs
↓ tool calls ↓ ↓ tool calls ↓ ↓ vision API ↓
Database
Supabase (Postgres + pgvector)
users, businesses, manufacturers, products, events, customers, services, jobs, messages, flags, api_costs
External APIs
Serper (Google) • BulkSMS • Resend
Web search, SMS verification, email verification
Backend Files
2
server.js (logic) + spec.md (personality)
Tools Available to Claude
18
search, register, verify, create, edit, remove, flag, image, scan, website, web_search, fetch, instructions, admin
Entity Types
8
Business, Manufacturer, Product, Event, Customer, Service, Job, User
API Call Sites
4
Conversation, tool-use loop, scan_image (vision), website_gen
How the System Prompt is Built
Every Claude API call sends the same base prompt + dynamic auth context. Here's exactly what Claude sees.
Part 1: spec.md (static)
# ZimRoots — AI Directory Assistant
You are ZimRoots, an AI-powered directory assistant for Harare's informal economy...
Personality rules, entity type definitions, anti-fraud warning text, two-tier verification flow, 6 conversation modes (Reader, Poster, Editor, Reporter, Image, Community Contributor), field requirements per entity type, [OPTIONS] button syntax, 10 hard rules for Claude's behavior.
Part 2: Session context (injected dynamically)
[Current user session: +263771234567]
[Auth: logged in as Chipo Mwale, phone: +263771234567, verified]
Varies per user state: "not logged in", "WhatsApp user ... not yet registered", "logged in as NAME, verified"
Part 3: Channel tag (WhatsApp only)
[Channel: WhatsApp — the user is already authenticated by their phone number. Never ask for phone number, never ask for OTP/verification. If they are not registered yet, just ask for their name.]
Prompt Assembly Code
// server.js line 1900
const systemPrompt = spec
+ `\n\n[Current user session: ${sessionId}]\n${authContext}`
+ (session.channel === 'whatsapp'
? `\n[Channel: WhatsApp — ...]`
: '');
Why this matters for cost: The system prompt is ~9,000 tokens. It's sent with every single API call — initial conversation, every tool-use loop iteration, etc. On a 5-turn conversation with 3 tool calls, that's 8 API calls × 9K tokens = 72K input tokens just for the prompt. At $3/M, that's $0.22 in prompt tokens alone.
Database Tables
CSS mockups of real Supabase tables showing actual column types and sample data.
| id | phone | phone_verified | id_verified | banned | meta (jsonb) | created_at |
| a1b2c3d4-... |
+263771234567 |
true |
false |
false |
{"name":"Chipo Mwale","auth_token":"...","phone_verified_via":"sms"} |
2026-04-08 10:30 |
| e5f6g7h8-... |
+263772555111 |
true |
true |
false |
{"name":"Tendai M.","id_image_url":"https://...","verified_via":"whatsapp"} |
2026-04-07 14:15 |
| id | user_id | name | description | category | status | meta (jsonb) |
| b3c4d5e6-... |
a1b2c3d4-... |
Chipo's Vegetable Stand |
Fresh vegetables: tomatoes, onions, cabbage, carrots |
Produce/Fresh Foods |
verified |
{"location":"Mbare","contact":"+263771234567","hours":"6am-6pm","payment_methods":"Cash, EcoCash","images":["https://..."],"delivery_areas":"Mbare","slug":"chipos-vegetable-stand"} |
| f8g9h0i1-... |
e5f6g7h8-... |
Tendai's Welding Works |
Custom gates, burglar bars, window frames |
Metal Fabrication |
verified |
{"location":"Glen View","contact":"+263772555111","verified_owner":true,"images":["https://...","https://..."]} |
The columns + meta pattern: Only fields used in SQL queries (WHERE, ORDER BY, JOIN, ilike search) are real columns. Everything else — location, hours, contact, images, payment methods — lives in the meta JSONB column. Adding a new field = just put it in meta. No migration needed. Claude reads the whole row anyway.
| id | session_id | role | content | image_url | meta (jsonb) | created_at |
| msg-001 |
+263771234567 |
user |
I want to register my business selling vegetables in Mbare |
null |
{} |
10:30:01 |
| msg-002 |
+263771234567 |
assistant |
Welcome to ZimRoots! / Mauya kuZimRoots! I'd love to help you list your vegetable business... |
null |
{"tool_calls":[{"name":"register_user","input":{"phone":"+263771234567","name":"Chipo"}}]} |
10:30:03 |
| session_id | user_phone | phase | model | input | output | cost_usd | content | tool_calls | created_at |
| +263771234567 |
+263771234567 |
conversation |
claude-sonnet-4-20250514 |
9,847 |
186 |
$0.0325 |
null (tool_use, no text) |
[{"name":"register_user"}] |
10:30:02 |
| +263771234567 |
+263771234567 |
tool_call |
claude-sonnet-4-20250514 |
10,234 |
312 |
$0.0775 |
Welcome to ZimRoots! / Mauya kuZimRoots!... |
null |
10:30:03 |
| +263771234567 |
+263771234567 |
website_gen |
claude-sonnet-4-20250514 |
1,247 |
6,892 |
$0.1072 |
<!DOCTYPE html><html lang="en">... |
null |
10:35:12 |
The Agentic Loop
Claude doesn't just answer — it acts. The server runs a loop: call Claude, execute tools, call Claude again, until it produces a final text reply.
// The core loop (server.js)
// Cacheable system prompt — static spec cached (90% discount on repeat calls)
const cachedSystem = [{ type: 'text', text: systemPrompt,
cache_control: { type: 'ephemeral' } }]; // <-- $0.30/M vs $3/M
const cachedTools = tools.map((t, i) =>
i === tools.length - 1 ? { ...t, cache_control: { type: 'ephemeral' } } : t);
let response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
system: cachedSystem, // spec.md + auth context (~9K tokens, CACHED)
messages: session.messages, // conversation history (max 30 msgs)
tools: cachedTools, // 18 tool definitions (last one CACHED)
});
logApiCost(sessionId, userPhone, response, 'conversation'); // $$$
const MAX_TOOL_ROUNDS = 8; // Safety cap — prevents runaway costs
let toolRound = 0;
while (response.stop_reason === 'tool_use' && toolRound < MAX_TOOL_ROUNDS) {
toolRound++;
// 1. Extract tool calls from Claude's response
const toolNames = [];
for (const block of response.content) {
if (block.type === 'tool_use') {
toolNames.push(block.name);
result = await handleTool(block.name, block.input, ...); // Execute!
}
}
// 2. Feed tool results back to Claude
session.messages.push({ role: 'user', content: toolResults });
// 3. Call Claude again with updated context (prompt hits cache!)
response = await anthropic.messages.create({ ...same cached params... });
logApiCost(sessionId, userPhone, response, 'tool_call', toolNames); // $$$
}
Cost with caching: The first API call pays full price for the ~9K token prompt ($0.027). Subsequent calls in the loop hit the cache at 90% discount ($0.003 each). A 3-round conversation goes from ~$0.09 to ~$0.03 in prompt costs. The MAX_TOOL_ROUNDS = 8 cap prevents unbounded loops — normal flows use 1-3 rounds.
Workflow: User Registration (WhatsApp)
Trace every step when a new WhatsApp user sends their first message.
WhatsApp
User sends: "Hi, I want to list my vegetable business"
Arrives via Twilio/Meta webhook. Phone: +263771234567
Server
converse("+263771234567", message, null, "whatsapp")
Session auto-created. WhatsApp auto-auth kicks in:
session.registeredPhone = "+263771234567"
session.authenticated = true
Lookup user in DB → not found (new user). Will need name.
Server
Build system prompt
spec.md + [Auth: WhatsApp user +263771234567 — authenticated but not yet registered. Ask for their name only, then call register_user.] + [Channel: WhatsApp]
Claude API Call #1
Initial conversation call
Model: claude-sonnet-4-20250514 | Max tokens: 1024
Messages: [{role: "user", content: "Hi, I want to list my vegetable business"}]
Tools: all 18 available
~9,900 input • ~180 output • ~$0.032
Cost Logged
api_costs INSERT
phase: "conversation", model: "claude-sonnet-4-20250514", tokens logged, cost calculated
Claude Responds
stop_reason: "end_turn" (no tool calls)
Claude asks for the user's name (as instructed by auth context). No tools needed yet.
User
Hi, I want to list my vegetable business
ZimRoots
Mhoro! / Welcome to ZimRoots! I'd love to help you get your vegetable business listed in our directory.
First, what's your name so I can set up your account?
WhatsApp
User sends: "Chipo Mwale"
Claude API Call #2
Claude sees name, calls register_user tool
stop_reason: "tool_use"
~10,200 input • ~120 output • ~$0.032
Database
INSERT INTO users
{
"id": "a1b2c3d4-e5f6-...",
"phone": "+263771234567",
"phone_verified": true, // WhatsApp = auto-verified
"meta": { "name": "Chipo Mwale" }
}
Claude API Call #3
Tool-use continuation — Claude sees registration result
stop_reason: "end_turn" (produces greeting + anti-fraud warning)
~10,600 input • ~280 output • ~$0.036
Cost Logged
api_costs INSERT
phase: "tool_call", tool_names: ["register_user"]
Messages Persisted
2 rows inserted into messages table (user msg + assistant reply). Fire-and-forget.
ZimRoots
Welcome, Chipo! You're all set up.
Before we create your listing, I need to be upfront:
ZimRoots takes accuracy very seriously. Every listing you submit is tied to your verified phone number. If we find false or misleading information — fake businesses, wrong addresses, impersonating someone else's business — your account will be
permanently banned with no option to appeal.
Are you happy to proceed?
Yes, I agree
No, I'd rather not
Total API Calls (this interaction)
3
1 conversation + 1 tool_use + 1 tool_call continuation
Estimated Cost
$0.10
~30,700 input tokens + ~580 output tokens
DB Operations
5
1 user lookup, 1 INSERT user, 2 INSERT messages, 2 INSERT api_costs
Workflow: Create a Business Listing
After registration + anti-fraud warning. Claude collects info conversationally, then creates the entity.
User
"Yes, I agree" (anti-fraud confirmation)
Claude
Asks: "What's the name of your business?"
No tool call — just conversation. Claude knows to collect: name, description, category, location, contact.
~$0.034
User
"I sell tomatoes, onions, cabbage at Mbare Market. Open 6am-6pm daily."
User provides multiple fields at once. Claude parses all of them.
Claude API Call
Claude calls create_entity with extracted data
stop_reason: "tool_use"
Server: splitEntityInput()
Separates columns from meta
// Real columns (used in SQL queries):
{ "name": "Chipo's Vegetable Stand", "description": "...", "category": "Produce/Fresh Foods" }
// Everything else → meta JSONB:
{ "location": "Mbare Market", "contact": "+263771234567", "hours": "6am-6pm daily" }
Database
INSERT INTO businesses
{
"id": "b3c4d5e6-f7g8-...",
"user_id": "a1b2c3d4-...",
"name": "Chipo's Vegetable Stand",
"description": "Fresh vegetables: tomatoes, onions, cabbage, carrots",
"category": "Produce/Fresh Foods",
"status": "verified", // phone_verified = true → live immediately
"embedding": null, // not generated yet
"meta": {
"location": "Mbare Market",
"contact": "+263771234567",
"hours": "6am-6pm daily",
"verified_owner": false, // no ID photo yet
"images": []
}
}
Claude Continuation
Sees result, offers enrichment + photo + website
~$0.038
User
I sell tomatoes, onions, cabbage at Mbare Market. Open 6am-6pm daily.
ZimRoots
Your listing
"Chipo's Vegetable Stand" is now live in the ZimRoots directory!
Want to make it stand out? I can ask a few quick questions to add more details, or you can upload a photo of your stand.
Add more details
Upload a photo
Create a free website
I'm done for now
Workflow: Search the Directory
How a search query flows from user to database and back. No authentication required.
User
"Where can I buy fresh vegetables in Mbare?"
Claude API Call
Claude calls search_directory tool
stop_reason: "tool_use"
// Tool call:
{
"name": "search_directory",
"input": {
"query": "fresh vegetables",
"entity_type": "businesses",
"location": "Mbare"
}
}
Step 1: Semantic Search (primary)
Generate embedding for query via OpenAI text-embedding-3-small, then call search_all_entities pgvector RPC.
-- Postgres RPC: search_all_entities(query_embedding, match_count)
SELECT * FROM (
(SELECT 'business', id, name, description, category,
1 - (embedding <=> query_embedding) AS similarity
FROM businesses WHERE embedding IS NOT NULL AND status = 'verified')
UNION ALL
-- ... 6 more entity tables ...
) ORDER BY similarity DESC LIMIT 10;
Cosine similarity: 1.0 = identical, 0.0 = unrelated. Results above ~0.3 are relevant.
Step 1b: ilike Fallback
If semantic search returns nothing (no embeddings yet, or OpenAI API error), falls back to keyword matching:
SELECT * FROM businesses
WHERE status = 'verified'
AND (name ILIKE '%vegetable%' OR description ILIKE '%vegetable%' ...)
Step 2: getNearbySuburbs("Mbare")
Reads nearby.json (148 Harare suburbs, 3km radius adjacency map)
// Returns:
["Mbare", "Mbare South", "National Sports Stadium", "Ardbennie", "Southerton"]
Step 3: Location filter
Server-side: filter results where meta.location contains any of the expanded suburbs ("Mbare", "Mbare South", etc.)
Why semantic search matters: In Harare's informal economy, people describe things in wildly different ways. "I fix cars" vs "motor vehicle mechanic" vs "panel beater" — keyword matching misses most connections. Semantic search finds them all. Real test results: "someone to fix my car" → Willowvale Auto Mechanics Hub (sim: 0.418). ilike would have returned nothing.
Claude Formats Response
Presents results in a readable format
User
Where can I buy fresh vegetables in Mbare?
ZimRoots
I found 1 result in the Mbare area:
Chipo's Vegetable Stand
Fresh vegetables: tomatoes, onions, cabbage, carrots
Location: Mbare Market
Hours: 6am-6pm daily
Contact: +263771234567
I also searched nearby areas: Mbare South, Ardbennie, Southerton.
Search again
See all produce
Workflow: Website Generation
The most expensive operation. A separate Claude API call with 8,192 max tokens generates a full HTML page.
User
"Create a free website for my business"
Claude API Call #1 (Conversation)
Claude calls generate_website tool
~$0.034
Database Reads
1. SELECT business by id
2. SELECT related products (WHERE business_id = ...)
3. SELECT related events (WHERE business_id = ...)
4. Check slug uniqueness
Claude API Call #2 (Website Generation)
SEPARATE API call — different from the conversation loop
Model: claude-sonnet-4-20250514
Max tokens: 8,192 (8x normal)
No system prompt. No tools. Just a single user message with the entity data + design requirements.
Response: complete, self-contained HTML document.
~1,200 input • ~6,500 output • ~$0.101
Cost Logged (website_gen)
api_costs INSERT with entity_id and entity_type
phase: "website_gen" — the most expensive phase per call due to high output tokens
Database Write
UPDATE businesses SET slug, website_html
slug: "chipos-vegetable-stand"
website_html: "..." (~15-25KB of HTML)
Claude API Call #3 (Continuation)
Claude receives tool result, presents URL to user
~$0.038
Website generation is the cost outlier. At ~$0.10 per generation (mostly output tokens), it costs 3x more than a typical conversation turn. But it produces a real, deployable HTML page that would take a human designer hours. The ROI is enormous — but worth monitoring.
Total API Calls
3
1 conversation + 1 website_gen + 1 continuation
Total Cost
~$0.17
Website gen dominates (~60% of cost)
Output Size
~20KB
Complete HTML with embedded CSS, responsive, branded
Performance Optimizations
Three architectural improvements that reduce costs and improve search quality.
Prompt Caching
-78%
Input token costs on repeat calls. System prompt + tools cached via Anthropic API (cache_control: ephemeral). Cached tokens: $0.30/M vs $3/M. 5-min TTL covers any active conversation.
Loop Safety Cap
8 max
Tool-use rounds per message. Normal: 1-3. Prevents runaway costs from stuck tool cycles. If hit, logs a warning and returns whatever text Claude last produced.
Semantic Search
pgvector
OpenAI text-embedding-3-small (1536 dims). Embeddings generated fire-and-forget on create/edit. search_all_entities RPC for cosine similarity. ilike as fallback.
Semantic Search: Before vs After
| User Query | Semantic Search (Top Result) | Similarity | ilike Would Find |
| "someone to fix my car" |
Willowvale Auto Mechanics Hub |
0.418 |
Nothing |
| "where can I get my hair done" |
African Crown Hair Salon |
0.455 |
Nothing |
| "live music this weekend" |
Jamtree Easter Music Festival |
0.425 |
Nothing |
| "handmade crafts and art" |
Patch Moekoe / National Handicraft Centre |
0.476 |
Only if "craft" in name |
| "fresh vegetables market" |
Maasdorp Farmers Market |
0.515 |
Only exact keyword matches |
Embedding Generation Flow
// Fire-and-forget: called after createEntity() and editEntity()
function embedEntity(table, entityId, name, description, category, meta) {
const parts = [name, description, category, meta?.location,
meta?.what_they_make, meta?.hours].filter(Boolean);
const text = parts.join(' — ');
getEmbedding(text).then(embedding => {
supabase.from(table).update({ embedding }).eq('id', entityId);
}); // Never blocks the response
}
Cost: Embeddings are nearly free. OpenAI text-embedding-3-small costs $0.02/M tokens. The entire 68-entity backfill cost less than $0.01. Each new entity costs ~$0.00001 to embed. The real cost is the search query embedding (~$0.00001 per search). Compare to $0.03-0.10 per Claude API call — embeddings are a rounding error.
Cost Tracking Architecture
How every dollar is tracked from API call to database to dashboard.
What's Logged Per API Call
Identity
session_id + user_phone
Who triggered the call (phone for WhatsApp, session ID for web)
Tokens
input + output
From response.usage. Cache tokens tracked in meta.
Cost
Pre-calculated USD
(input * $3/M) + (output * $15/M). Frozen at insert time.
Content
Text + tool calls
What Claude actually said/did. Full audit trail.
The 4 Instrumented Call Sites
| Phase | Trigger | Max Tokens | Has System Prompt? | Has Tools? | Typical Cost |
| conversation |
Every user message (initial call) |
1,024 |
Yes (spec + auth + channel) |
Yes (all 18) |
$0.030-0.040 |
| tool_call |
After tool execution (loop continuation) |
1,024 |
Yes (same prompt) |
Yes (all 18) |
$0.032-0.045 |
| scan_image |
scan_business_image tool (vision) |
1,024 |
No |
No |
$0.008-0.015 |
| website_gen |
generate_website tool (HTML generation) |
8,192 |
No |
No |
$0.080-0.120 |
Estimated Cost Per Workflow
| User Action | API Calls | Est. Input Tokens | Est. Output Tokens | Est. Cost |
| Simple greeting |
1 |
~9,800 |
~200 |
$0.032 |
| Search for a business |
2-3 |
~20,000 |
~500 |
$0.068 |
| Register (WhatsApp) |
2-3 |
~20,000 |
~500 |
$0.068 |
| Register (Web + OTP) |
6-8 |
~60,000 |
~1,200 |
$0.20 |
| Create business listing |
3-5 |
~35,000 |
~800 |
$0.12 |
| Enrich listing (deep discovery) |
4-6 |
~45,000 |
~1,000 |
$0.15 |
| Scan business card (vision) |
1 (+ loop) |
~1,500 |
~300 |
$0.009 |
| Generate website |
1 (+ loop) |
~1,200 |
~6,500 |
$0.101 |
| Full onboarding (register + list + enrich + website) |
12-18 |
~120,000 |
~9,000 |
~$0.50 ~$0.25 |
The economics (with caching): A full user onboarding costs roughly $0.25 with prompt caching (down from ~$0.50 without). At scale: 100 users/day = ~$25/day = ~$750/month. The /costs dashboard shows exact spending, and the cache_read_input_tokens metric in each cost row confirms caching is working.
Dashboard
The costs dashboard is live at /costs on the Railway deployment. It queries the api_costs table and shows:
- Total spend over configurable period (7/30/90/365 days)
- Daily spend bar chart
- Breakdown by phase (conversation vs tool_call vs scan_image vs website_gen)
- Top users by cost (who's using the most API?)
- Recent 20 calls with content preview
ZR
ZimRoots System Walkthrough — Last updated 2026-04-08