boyoAI API Reference

Base URL: https://ai.izitechnologies.com

All endpoints require Authorization: Bearer YOUR_API_KEY unless marked as public.

Auth Chat / LLM Text to Speech Speech to Text Voice Chat Vision & OCR Knowledge Base Agents IVA WebSocket API Keys Admin

Authentication

POST /auth/login

Authenticate and receive a session cookie. Public

ParameterTypeDescription
emailstringrequiredUser email
passwordstringrequiredUser password
curl -X POST /auth/login -H "Content-Type: application/json" \
  -d '{"email":"user@example.com","password":"secret"}'

# Response: {"user": {"id":1,"email":"...","role":"customer",...}}
# Sets session cookie
POST /auth/logout

Clear session cookie.

GET /api/me

Get the authenticated user's profile.

# Response: {"id":1,"email":"...","name":"...","role":"customer","rate_limit":60,...}
GET /health

Service health check. Public

curl https://ai.izitechnologies.com/health
# {"status":"ok","services":{"ollama":true,"chatterbox":true,"whisper":true}}

Chat / LLM

POST /api/chat

Streaming chat completion with tool/function calling. Returns Server-Sent Events (SSE). If the user has a Knowledge Base, relevant context is auto-injected.

ParameterTypeDescription
messagesarrayrequiredArray of message objects with role (system/user/assistant/tool) and content
toolsarrayoptionalTool/function definitions for function calling
curl -N -X POST /api/chat \
  -H "Authorization: Bearer aigw_..." \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get weather for a city",
          "parameters": {"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}
        }
      }
    ]
  }'
SSE Events
# Token streaming
event: token
data: {"content": "Hi", "done": false}

# Tool/function call (LLM wants to invoke a tool)
event: tool_call
data: {"tool_calls": [{"function": {"name": "get_weather", "arguments": {"city": "Paris"}}}]}

# Completion
event: done
data: {"content": "", "done": true, "total_duration": 1234567, "eval_count": 42}

# Error
event: error
data: {"error": "something went wrong"}

After receiving tool_call, execute the tool and send the result back as a message with role: "tool".

Text to Speech

POST /api/tts

Synthesize speech from text. Returns audio/wav. Supports 26+ languages, 6 preset voices, and custom cloned voices.

ParameterTypeDescription
textstringrequiredText to synthesize (max ~300 chars recommended)
voice_idstringoptionalVoice ID. Default: preset_aria
languagestringoptionalISO language code. Default: en
exaggerationfloatoptionalExpressiveness 0.25-2.0. Default: 0.5
temperaturefloatoptionalVariation 0.05-2.0. Default: 0.8
cfg_weightfloatoptionalStability 0.0-1.0. Default: 0.5
curl -X POST /api/tts \
  -H "Authorization: Bearer aigw_..." \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello world!","voice_id":"preset_aria","language":"en"}' \
  -o speech.wav
GET /api/voices

List all available voices (preset and cloned).

# Response:
{"voices": [
  {"voice_id":"preset_aria","name":"Aria","type":"preset","preview_url":"..."},
  {"voice_id":"clone_myvoice","name":"myvoice","type":"clone","preview_url":"..."}
]}
GET /api/languages

List all supported TTS languages (26+).

# Response: {"languages": {"en":"English","bn":"Bengali","fr":"French",...}}
POST /api/voices/clone

Create a cloned voice from a 5-30 second audio sample.

ParameterTypeDescription
namestringrequiredVoice name
filefilerequiredAudio sample (WAV, MP3, WebM)
curl -X POST /api/voices/clone \
  -H "Authorization: Bearer aigw_..." \
  -F "name=My Voice" -F "file=@sample.wav"

# Response: {"voice_id":"clone_my_voice","name":"My Voice","type":"clone"}
DELETE /api/voices/{voice_id}

Delete a cloned voice. Preset voices cannot be deleted.

Speech to Text

POST /api/stt

Transcribe audio to text using Whisper large-v3. Supports 90+ languages with auto-detection.

ParameterTypeDescription
filefilerequiredAudio file (WAV, MP3, WebM, FLAC, OGG)
languagestringoptionalLanguage hint (ISO code). Auto-detected if omitted.
curl -X POST /api/stt \
  -H "Authorization: Bearer aigw_..." \
  -F "file=@recording.wav" -F "language=en"

# Response:
{
  "text": "Hello world",
  "language": "en",
  "language_probability": 0.99,
  "duration": 3.456,
  "segments": [
    {"start": 0.0, "end": 1.5, "text": "Hello world"}
  ]
}

Voice Chat (Batch)

POST /api/voicechat

Full pipeline in a single request: Audio in → STT → LLM → TTS → Audio out. Returns audio/wav.

ParameterTypeDescription
filefilerequiredAudio recording
historystringoptionalJSON array of previous messages for multi-turn
curl -X POST /api/voicechat \
  -H "Authorization: Bearer aigw_..." \
  -F "file=@recording.wav" -F 'history=[]' -o response.wav

# Response headers:
# X-Transcription: what the user said
# X-LLM-Response: what the AI replied
# Body: audio/wav of the spoken response

Vision & OCR

POST /api/ocr

Extract text from images via OCR (Tesseract). Supports 12 languages.

ParameterTypeDescription
filefilerequiredImage file (PNG, JPG, TIFF, BMP, WebP)
languagestringoptionalLanguage code (en, bn, fr, es, de, ar, hi, ja, ko, zh, pt, ru). Default: en
curl -X POST /api/ocr \
  -H "Authorization: Bearer aigw_..." \
  -F "file=@document.png" -F "language=en"

# Response: {"text":"extracted text","language":"en","filename":"document.png"}
POST /api/vision/analyze

AI-powered image analysis. Send any image with a custom prompt and get an intelligent description.

ParameterTypeDescription
filefilerequiredImage file
promptstringoptionalAnalysis prompt. Default: "Describe this image in detail."
curl -X POST /api/vision/analyze \
  -H "Authorization: Bearer aigw_..." \
  -F "file=@photo.jpg" \
  -F "prompt=What objects are in this image? List them."

# Response: {"analysis": "The image contains a laptop, a coffee cup, ..."}
POST /api/vision/analyze/stream

Same as above but streams the response as SSE (same format as /api/chat).

POST /api/vision/scan

Document intelligence. Extracts structured JSON data from identity documents, invoices, receipts, and business cards. Uses OCR + AI vision together for maximum accuracy.

ParameterTypeDescription
filefilerequiredDocument image
typestringoptionalpassport, drivers_license, id_card, invoice, receipt, business_card, or auto
languagestringoptionalDocument language for OCR. Default: en
curl -X POST /api/vision/scan \
  -H "Authorization: Bearer aigw_..." \
  -F "file=@passport.jpg" -F "type=passport"

# Response:
{
  "document_type": "passport",
  "raw_text": "OCR extracted text...",
  "ai_analysis": "AI interpretation...",
  "extracted": {
    "full_name": "John Michael Smith",
    "surname": "Smith",
    "given_names": "John Michael",
    "nationality": "United States",
    "date_of_birth": "15 Jan 1990",
    "passport_number": "AB1234567",
    "date_of_issue": "01 Mar 2020",
    "date_of_expiry": "01 Mar 2030",
    "sex": "M",
    "mrz_line_1": "P<USASMITH<<JOHN<MICHAEL...",
    "mrz_line_2": "AB1234567..."
  },
  "filename": "passport.jpg"
}

Passport fields: full_name, surname, given_names, nationality, date_of_birth, sex, place_of_birth, date_of_issue, date_of_expiry, passport_number, issuing_authority, mrz_line_1, mrz_line_2.
Driver's license fields: full_name, date_of_birth, address, license_number, class, issue_date, expiry_date, sex, height, restrictions.
Invoice fields: vendor_name, invoice_number, invoice_date, due_date, line_items, subtotal, tax, total_amount, currency.

Knowledge Base (RAG)

Per-user isolated knowledge bases. Documents are automatically chunked, embedded, and injected into all chat and agent conversations.

GET /api/kb

List your knowledge bases.

# Response: [{"id":1,"name":"Product Docs","doc_count":5,"description":"..."}]
POST /api/kb

Create a knowledge base.

ParameterTypeDescription
namestringrequiredKB name
descriptionstringoptionalDescription
DELETE /api/kb/{id}

Delete a KB and all its documents and embeddings.

GET /api/kb/{id}/documents

List documents in a knowledge base.

POST /api/kb/{id}/documents

Upload a document. Supports text, PDF, CSV, images (OCR), and URL scraping.

ParameterTypeDescription
filefileoption AFile upload: .txt, .pdf, .csv, .md, .png, .jpg, .tiff
urlstringoption BURL to scrape and ingest
languagestringoptionalLanguage for image OCR
# Upload a file
curl -X POST /api/kb/1/documents -H "Authorization: Bearer aigw_..." \
  -F "file=@pricing.pdf"

# Scrape a URL
curl -X POST /api/kb/1/documents -H "Authorization: Bearer aigw_..." \
  -F "url=https://example.com/faq"

# Upload an image (OCR extracted automatically)
curl -X POST /api/kb/1/documents -H "Authorization: Bearer aigw_..." \
  -F "file=@scanned_doc.png" -F "language=en"
DELETE /api/kb/{id}/documents/{doc_id}

Delete a document and its embeddings.

POST /api/kb/{id}/query

Test semantic search against a knowledge base.

ParameterTypeDescription
querystringrequiredSearch query
curl -X POST /api/kb/1/query -H "Authorization: Bearer aigw_..." \
  -H "Content-Type: application/json" -d '{"query":"pricing"}'

# Response: {"results":[{"content":"...","source":"pricing.pdf","score":0.89}]}

Agents

Pre-built and custom AI agents with specialized system prompts and KB access.

GET /api/agents

List agents available to the current user (public + own private agents).

# Response: [{"id":1,"name":"Customer Service","description":"...","system_prompt":"...","icon":"chat","color":"#6366f1","capabilities":"[\"KB\",\"Voice\"]","is_public":true}]
GET /api/agents/{id}

Get a single agent's details.

POST /api/agents

Create a custom agent.

ParameterTypeDescription
namestringrequiredAgent name
system_promptstringrequiredSystem instructions for the AI
descriptionstringoptionalDescription
iconstringoptionalIcon: bot, chat, mic, chart, pen, code, globe
colorstringoptionalHex color. Default: #6366f1
capabilitiesstringoptionalJSON array of capability tags
is_publicbooloptionalMake visible to all users (admin only)
PUT /api/agents/{id}

Update an agent. Same parameters as create.

DELETE /api/agents/{id}

Delete an agent. Must be owner or admin.

POST /api/agents/{id}/chat

Chat with an agent. SSE streaming (same format as /api/chat). The agent's system prompt is prepended, and RAG context from your KBs is auto-injected.

ParameterTypeDescription
messagesarrayrequiredConversation messages (same format as /api/chat)
curl -N -X POST /api/agents/1/chat \
  -H "Authorization: Bearer aigw_..." \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is the refund policy?"}]}'

# SSE stream: event: token → event: done (same as /api/chat)

IVA — Real-Time Voice Agent (WebSocket)

Full-duplex streaming voice interaction. Send audio chunks, receive live transcription, LLM token streaming, and TTS audio — all over a single WebSocket.

WSS /ws/iva?token=YOUR_API_KEY
Client → Server
TypeFormatDescription
JSON{"type":"start","voice_id":"preset_aria","language":"en"}Configure session
BinaryAudio bytes (WebM/Opus or PCM 16kHz)Audio input chunks
JSON{"type":"stop"}End of utterance — triggers pipeline
JSON{"type":"end"}Close session
Server → Client
TypeFormatDescription
JSON{"type":"stt","text":"hello world","partial":false}STT transcript
JSON{"type":"llm","token":"Hi"}LLM streaming token
JSON{"type":"llm_done"}LLM finished
BinaryWAV audio bytesTTS audio chunk
JSON{"type":"tts_done"}TTS finished
JSON{"type":"error","message":"..."}Error
wscat -c "wss://ai.izitechnologies.com/ws/iva?token=aigw_..."

→ {"type":"start","voice_id":"preset_aria","language":"en"}
→ [binary audio frames...]
→ {"type":"stop"}
← {"type":"stt","text":"hello","partial":false}
← {"type":"llm","token":"Hi"}
← {"type":"llm","token":" there!"}
← {"type":"llm_done"}
← [binary TTS audio]
← {"type":"tts_done"}

API Keys

GET /api/keys

List your API keys.

POST /api/keys

Create a new API key.

ParameterTypeDescription
namestringrequiredKey label (e.g. "My App")
curl -X POST /api/keys -H "Authorization: Bearer aigw_..." \
  -H "Content-Type: application/json" -d '{"name":"Production"}'

# Response: {"id":1,"key":"aigw_abc123...","name":"Production"}
# ⚠️ The key value is only shown once — copy it immediately.
DELETE /api/keys/{id}

Delete an API key permanently.

Admin (requires admin role)

GET /admin/users

List all users.

POST /admin/users

Create a user. Params: email, password, name, role (customer/admin), rate_limit.

PUT /admin/users/{id}

Update a user. Params: name, role, active, rate_limit.

DELETE /admin/users/{id}

Deactivate a user.

GET /admin/usage

Usage statistics. Query param: period (24h, 7d, 30d).

# Response: [{"user_id":1,"email":"...","endpoint":"/api/chat","count":42}]
GET /admin/apikeys

List all API keys across all users.

POST /admin/apikeys

Generate a key for any user. Params: user_id, name.

DELETE /admin/apikeys/{id}

Delete any API key.

GET /admin/kb

List all knowledge bases across all users.

Have an account? Sign in to boyoAI

38 API endpoints · SSE streaming · WebSocket · REST