boyoAI — API Documentation

Authentication

POST /auth/login

Authenticate and receive a session cookie. Public

Parameter	Type		Description
email	string	required	User email
password	string	required	User password

curl -X POST /auth/login -H "Content-Type: application/json" \
  -d '{"email":"user@example.com","password":"secret"}'

# Response: {"user": {"id":1,"email":"...","role":"customer",...}}
# Sets session cookie

POST /auth/logout

Clear session cookie.

GET /api/me

Get the authenticated user's profile.

# Response: {"id":1,"email":"...","name":"...","role":"customer","rate_limit":60,...}

GET /health

Service health check. Public

curl https://ai.izitechnologies.com/health
# {"status":"ok","services":{"ollama":true,"chatterbox":true,"whisper":true}}

Chat / LLM

POST /api/chat

Streaming chat completion with tool/function calling. Returns Server-Sent Events (SSE). If the user has a Knowledge Base, relevant context is auto-injected.

Parameter	Type		Description
messages	array	required	Array of message objects with `role` (system/user/assistant/tool) and `content`
tools	array	optional	Tool/function definitions for function calling

curl -N -X POST /api/chat \
  -H "Authorization: Bearer aigw_..." \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get weather for a city",
          "parameters": {"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}
        }
      }
    ]
  }'

SSE Events

# Token streaming
event: token
data: {"content": "Hi", "done": false}

# Tool/function call (LLM wants to invoke a tool)
event: tool_call
data: {"tool_calls": [{"function": {"name": "get_weather", "arguments": {"city": "Paris"}}}]}

# Completion
event: done
data: {"content": "", "done": true, "total_duration": 1234567, "eval_count": 42}

# Error
event: error
data: {"error": "something went wrong"}

After receiving tool_call, execute the tool and send the result back as a message with role: "tool".

Text to Speech

POST /api/tts

Synthesize speech from text. Returns audio/wav. Supports 26+ languages, 6 preset voices, and custom cloned voices.

Parameter	Type		Description
text	string	required	Text to synthesize (max ~300 chars recommended)
voice_id	string	optional	Voice ID. Default: `preset_aria`
language	string	optional	ISO language code. Default: `en`
exaggeration	float	optional	Expressiveness 0.25-2.0. Default: 0.5
temperature	float	optional	Variation 0.05-2.0. Default: 0.8
cfg_weight	float	optional	Stability 0.0-1.0. Default: 0.5

curl -X POST /api/tts \
  -H "Authorization: Bearer aigw_..." \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello world!","voice_id":"preset_aria","language":"en"}' \
  -o speech.wav

GET /api/voices

List all available voices (preset and cloned).

# Response:
{"voices": [
  {"voice_id":"preset_aria","name":"Aria","type":"preset","preview_url":"..."},
  {"voice_id":"clone_myvoice","name":"myvoice","type":"clone","preview_url":"..."}
]}

GET /api/languages

List all supported TTS languages (26+).

# Response: {"languages": {"en":"English","bn":"Bengali","fr":"French",...}}

POST /api/voices/clone

Create a cloned voice from a 5-30 second audio sample.

Parameter	Type		Description
name	string	required	Voice name
file	file	required	Audio sample (WAV, MP3, WebM)

curl -X POST /api/voices/clone \
  -H "Authorization: Bearer aigw_..." \
  -F "name=My Voice" -F "file=@sample.wav"

# Response: {"voice_id":"clone_my_voice","name":"My Voice","type":"clone"}

DELETE /api/voices/{voice_id}

Delete a cloned voice. Preset voices cannot be deleted.

Speech to Text

POST /api/stt

Transcribe audio to text using Whisper large-v3. Supports 90+ languages with auto-detection.

Parameter	Type		Description
file	file	required	Audio file (WAV, MP3, WebM, FLAC, OGG)
language	string	optional	Language hint (ISO code). Auto-detected if omitted.

curl -X POST /api/stt \
  -H "Authorization: Bearer aigw_..." \
  -F "file=@recording.wav" -F "language=en"

# Response:
{
  "text": "Hello world",
  "language": "en",
  "language_probability": 0.99,
  "duration": 3.456,
  "segments": [
    {"start": 0.0, "end": 1.5, "text": "Hello world"}
  ]
}

Voice Chat (Batch)

POST /api/voicechat

Full pipeline in a single request: Audio in → STT → LLM → TTS → Audio out. Returns audio/wav.

Parameter	Type		Description
file	file	required	Audio recording
history	string	optional	JSON array of previous messages for multi-turn

curl -X POST /api/voicechat \
  -H "Authorization: Bearer aigw_..." \
  -F "file=@recording.wav" -F 'history=[]' -o response.wav

# Response headers:
# X-Transcription: what the user said
# X-LLM-Response: what the AI replied
# Body: audio/wav of the spoken response

Vision & OCR

POST /api/ocr

Extract text from images via OCR (Tesseract). Supports 12 languages.

Parameter	Type		Description
file	file	required	Image file (PNG, JPG, TIFF, BMP, WebP)
language	string	optional	Language code (en, bn, fr, es, de, ar, hi, ja, ko, zh, pt, ru). Default: en

curl -X POST /api/ocr \
  -H "Authorization: Bearer aigw_..." \
  -F "file=@document.png" -F "language=en"

# Response: {"text":"extracted text","language":"en","filename":"document.png"}

POST /api/vision/analyze

AI-powered image analysis. Send any image with a custom prompt and get an intelligent description.

Parameter	Type		Description
file	file	required	Image file
prompt	string	optional	Analysis prompt. Default: "Describe this image in detail."

curl -X POST /api/vision/analyze \
  -H "Authorization: Bearer aigw_..." \
  -F "file=@photo.jpg" \
  -F "prompt=What objects are in this image? List them."

# Response: {"analysis": "The image contains a laptop, a coffee cup, ..."}

POST /api/vision/analyze/stream

Same as above but streams the response as SSE (same format as /api/chat).

POST /api/vision/scan

Document intelligence. Extracts structured JSON data from identity documents, invoices, receipts, and business cards. Uses OCR + AI vision together for maximum accuracy.

Parameter	Type		Description
file	file	required	Document image
type	string	optional	`passport`, `drivers_license`, `id_card`, `invoice`, `receipt`, `business_card`, or `auto`
language	string	optional	Document language for OCR. Default: en

curl -X POST /api/vision/scan \
  -H "Authorization: Bearer aigw_..." \
  -F "file=@passport.jpg" -F "type=passport"

# Response:
{
  "document_type": "passport",
  "raw_text": "OCR extracted text...",
  "ai_analysis": "AI interpretation...",
  "extracted": {
    "full_name": "John Michael Smith",
    "surname": "Smith",
    "given_names": "John Michael",
    "nationality": "United States",
    "date_of_birth": "15 Jan 1990",
    "passport_number": "AB1234567",
    "date_of_issue": "01 Mar 2020",
    "date_of_expiry": "01 Mar 2030",
    "sex": "M",
    "mrz_line_1": "P<USASMITH<<JOHN<MICHAEL...",
    "mrz_line_2": "AB1234567..."
  },
  "filename": "passport.jpg"
}

Passport fields: full_name, surname, given_names, nationality, date_of_birth, sex, place_of_birth, date_of_issue, date_of_expiry, passport_number, issuing_authority, mrz_line_1, mrz_line_2.
Driver's license fields: full_name, date_of_birth, address, license_number, class, issue_date, expiry_date, sex, height, restrictions.
Invoice fields: vendor_name, invoice_number, invoice_date, due_date, line_items, subtotal, tax, total_amount, currency.

Knowledge Base (RAG)

Per-user isolated knowledge bases. Documents are automatically chunked, embedded, and injected into all chat and agent conversations.

GET /api/kb

List your knowledge bases.

# Response: [{"id":1,"name":"Product Docs","doc_count":5,"description":"..."}]

POST /api/kb

Create a knowledge base.

Parameter	Type		Description
name	string	required	KB name
description	string	optional	Description

DELETE /api/kb/{id}

Delete a KB and all its documents and embeddings.

GET /api/kb/{id}/documents

List documents in a knowledge base.

POST /api/kb/{id}/documents

Upload a document. Supports text, PDF, CSV, images (OCR), and URL scraping.

Parameter	Type		Description
file	file	option A	File upload: .txt, .pdf, .csv, .md, .png, .jpg, .tiff
url	string	option B	URL to scrape and ingest
language	string	optional	Language for image OCR

# Upload a file
curl -X POST /api/kb/1/documents -H "Authorization: Bearer aigw_..." \
  -F "file=@pricing.pdf"

# Scrape a URL
curl -X POST /api/kb/1/documents -H "Authorization: Bearer aigw_..." \
  -F "url=https://example.com/faq"

# Upload an image (OCR extracted automatically)
curl -X POST /api/kb/1/documents -H "Authorization: Bearer aigw_..." \
  -F "file=@scanned_doc.png" -F "language=en"

DELETE /api/kb/{id}/documents/{doc_id}

Delete a document and its embeddings.

POST /api/kb/{id}/query

Test semantic search against a knowledge base.

Parameter	Type		Description
query	string	required	Search query

curl -X POST /api/kb/1/query -H "Authorization: Bearer aigw_..." \
  -H "Content-Type: application/json" -d '{"query":"pricing"}'

# Response: {"results":[{"content":"...","source":"pricing.pdf","score":0.89}]}

Agents

Pre-built and custom AI agents with specialized system prompts and KB access.

GET /api/agents

List agents available to the current user (public + own private agents).

# Response: [{"id":1,"name":"Customer Service","description":"...","system_prompt":"...","icon":"chat","color":"#6366f1","capabilities":"[\"KB\",\"Voice\"]","is_public":true}]

GET /api/agents/{id}

Get a single agent's details.

POST /api/agents

Create a custom agent.

Parameter	Type		Description
name	string	required	Agent name
system_prompt	string	required	System instructions for the AI
description	string	optional	Description
icon	string	optional	Icon: bot, chat, mic, chart, pen, code, globe
color	string	optional	Hex color. Default: #6366f1
capabilities	string	optional	JSON array of capability tags
is_public	bool	optional	Make visible to all users (admin only)

PUT /api/agents/{id}

Update an agent. Same parameters as create.

DELETE /api/agents/{id}

Delete an agent. Must be owner or admin.

POST /api/agents/{id}/chat

Chat with an agent. SSE streaming (same format as /api/chat). The agent's system prompt is prepended, and RAG context from your KBs is auto-injected.

Parameter	Type		Description
messages	array	required	Conversation messages (same format as /api/chat)

curl -N -X POST /api/agents/1/chat \
  -H "Authorization: Bearer aigw_..." \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is the refund policy?"}]}'

# SSE stream: event: token → event: done (same as /api/chat)

IVA — Real-Time Voice Agent (WebSocket)

Full-duplex streaming voice interaction. Send audio chunks, receive live transcription, LLM token streaming, and TTS audio — all over a single WebSocket.

WSS /ws/iva?token=YOUR_API_KEY

Client → Server

Type	Format	Description
JSON	`{"type":"start","voice_id":"preset_aria","language":"en"}`	Configure session
Binary	Audio bytes (WebM/Opus or PCM 16kHz)	Audio input chunks
JSON	`{"type":"stop"}`	End of utterance — triggers pipeline
JSON	`{"type":"end"}`	Close session

Server → Client

Type	Format	Description
JSON	`{"type":"stt","text":"hello world","partial":false}`	STT transcript
JSON	`{"type":"llm","token":"Hi"}`	LLM streaming token
JSON	`{"type":"llm_done"}`	LLM finished
Binary	WAV audio bytes	TTS audio chunk
JSON	`{"type":"tts_done"}`	TTS finished
JSON	`{"type":"error","message":"..."}`	Error

wscat -c "wss://ai.izitechnologies.com/ws/iva?token=aigw_..."

→ {"type":"start","voice_id":"preset_aria","language":"en"}
→ [binary audio frames...]
→ {"type":"stop"}
← {"type":"stt","text":"hello","partial":false}
← {"type":"llm","token":"Hi"}
← {"type":"llm","token":" there!"}
← {"type":"llm_done"}
← [binary TTS audio]
← {"type":"tts_done"}

API Keys

GET /api/keys

List your API keys.

POST /api/keys

Create a new API key.

Parameter	Type		Description
name	string	required	Key label (e.g. "My App")

curl -X POST /api/keys -H "Authorization: Bearer aigw_..." \
  -H "Content-Type: application/json" -d '{"name":"Production"}'

# Response: {"id":1,"key":"aigw_abc123...","name":"Production"}
# ⚠️ The key value is only shown once — copy it immediately.

DELETE /api/keys/{id}

Delete an API key permanently.

Admin (requires admin role)

GET /admin/users

List all users.

POST /admin/users

Create a user. Params: email, password, name, role (customer/admin), rate_limit.

PUT /admin/users/{id}

Update a user. Params: name, role, active, rate_limit.

DELETE /admin/users/{id}

Deactivate a user.

GET /admin/usage

Usage statistics. Query param: period (24h, 7d, 30d).

# Response: [{"user_id":1,"email":"...","endpoint":"/api/chat","count":42}]

GET /admin/apikeys

List all API keys across all users.

POST /admin/apikeys

Generate a key for any user. Params: user_id, name.

DELETE /admin/apikeys/{id}

Delete any API key.

GET /admin/kb

List all knowledge bases across all users.