Skip to main content

Overview

InsForge provides unified AI capabilities through OpenRouter, giving you access to multiple LLM providers with a single API and consistent pricing model.

Technology Stack

Core Components

ComponentTechnologyPurpose
AI GatewayOpenRouterUnified access to multiple AI providers
Chat ServiceNode.js + SSEHandle chat completions with streaming
Image ServiceAsync processingGenerate images via AI models
ConfigurationPostgreSQLStore system prompts per project
Usage TrackingPostgreSQLMonitor token usage
Response FormatJSON/SSEStandard and streaming responses

Supported Models

InsForge provides access to models from multiple AI providers through our unified gateway. Use the format provider/model when specifying a model in SDK calls.
ProviderModels
OpenAIopenai/gpt-5.2, openai/gpt-5, openai/gpt-5-mini, openai/gpt-4o, openai/gpt-4o-mini
Anthropicanthropic/claude-opus-4.5, anthropic/claude-sonnet-4.6, anthropic/claude-haiku-4.5, anthropic/claude-opus-4.1, anthropic/claude-sonnet-4.5, anthropic/claude-sonnet-4, anthropic/claude-3.5-haiku
Googlegoogle/gemini-2.0-flash-001, google/gemini-2.5-flash-lite, google/gemini-2.5-flash-image-preview, google/gemini-2.5-pro, google/gemini-3-pro-image-preview, google/gemini-3-flash-preview
X-AIx-ai/grok-4, x-ai/grok-4-fast, x-ai/grok-4.1-fast
DeepSeekdeepseek/deepseek-v3.2, deepseek/deepseek-v3.2-exp, deepseek/deepseek-chat, deepseek/deepseek-r1
Minimaxminimax/minimax-m2.1
Z-AIz-ai/glm-4.7, z-ai/glm-4.6
Kimimoonshotai/kimi-k2.5
Model availability may vary. Use the GET /api/ai/models endpoint or get-backend-metadata MCP tool to get the current list of available models for your backend.

Capabilities

CapabilityDescription
Chat CompletionsMulti-turn conversations with streaming support
Tool CallingLet the AI invoke custom functions (e.g., database lookups, API calls)
Web SearchAugment responses with real-time web results and citations
PDF/File ParsingProcess PDFs and documents directly in chat (pdf-text, mistral-ocr, native engines)
Image VisionAnalyze and describe images in conversations
EmbeddingsGenerate vector embeddings for semantic search
Image GenerationCreate images from text prompts
See SDK references for implementation details: TypeScript | Swift | Kotlin | Flutter

OpenRouter Integration

Why OpenRouter?

  • Single API: One integration for multiple providers
  • Unified Billing: Consistent pricing across models
  • Automatic Failover: Fallback to alternative models
  • Rate Limiting: Built-in rate limit handling

Request Flow

Chat Completions

Request Processing

  1. Authentication: Verify JWT token
  2. Configuration: Load project AI settings
  3. System Prompt: Prepend configured prompt
  4. Model Selection: Use specified or default model
  5. OpenRouter Call: Forward to OpenRouter
  6. Response Handling: Stream or batch response
  7. Usage Tracking: Record token usage

Streaming Architecture

// Server-Sent Events (SSE) for streaming
async function* streamChat(messages, options) {
  const stream = await openRouter.chat.completions.create({
    model: options.model,
    messages: messages,
    stream: true,
    temperature: options.temperature,
    max_tokens: options.maxTokens
  });

  for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
      yield { chunk: chunk.choices[0].delta.content };
    }
  }
  
  yield { done: true, tokenUsage: {...} };
}

Tool Calling

Tool calling (function calling) lets the AI model invoke custom functions you define. The client sends tool definitions, the model decides when to call them, and the client executes the function and sends the result back.

Response Formats

Non-streaming Response:
{
  "response": "AI generated response text",
  "model": "anthropic/claude-3.5-haiku",
  "usage": {
    "promptTokens": 150,
    "completionTokens": 200,
    "totalTokens": 350
  }
}
Streaming Response (SSE):
data: {"chunk": "The "}
data: {"chunk": "answer "}
data: {"chunk": "is..."}
data: {"done": true, "tokenUsage": {...}}

Image Generation

Generation Flow

  1. Prompt Processing: Validate and enhance prompt
  2. Model Selection: Choose appropriate image model
  3. Size Configuration: Set dimensions and quality
  4. OpenRouter Request: Send generation request
  5. URL Generation: Receive image URLs
  6. Storage Integration: Optional save to storage
  7. Response Delivery: Return URLs to client

Image Parameters

ParameterOptionsDescription
modelModel IDsAI model to use
promptStringText description
size512x512, 1024x1024, etc.Image dimensions
qualitystandard, hdImage quality
numImages1-4Number of variations
stylevivid, naturalStyle preference

Configuration Management

Database Schema

CREATE TABLE _ai_configs (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  modality VARCHAR(255) NOT NULL,     -- 'text' or 'image'
  provider VARCHAR(255) NOT NULL,      -- 'openrouter'
  model_id VARCHAR(255) UNIQUE NOT NULL,
  system_prompt TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

System Prompts

  • Configured per project
  • Applied to all chat requests
  • Cannot be overridden by client
  • Support for multiple configurations

Usage Tracking

Metrics Collected

CREATE TABLE _ai_usage (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  config_id UUID NOT NULL,
  input_tokens INT,
  output_tokens INT,
  image_count INT,
  image_resolution TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  FOREIGN KEY (config_id) REFERENCES _ai_configs(id) ON DELETE NO ACTION
);

Security & Rate Limiting

API Key Security

OpenRouter key stored server-side only

Request Validation

Input sanitization and size limits

Rate Limiting

Per-user and per-project limits

Usage Quotas

Configurable token limits

Content Filtering

Optional content moderation

Audit Logging

Track all AI operations

Error Handling

Error Types

ErrorCodeDescription
Model Not Found400Invalid model ID
Rate Limited429Too many requests
Token Limit400Exceeds max tokens
OpenRouter Error502Upstream provider issue
Quota Exceeded402Usage limit reached
Invalid Input400Malformed request

Retry Strategy

async function retryableRequest(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429) { // Rate limited
        await sleep(Math.pow(2, i) * 1000); // Exponential backoff
        continue;
      }
      throw error;
    }
  }
}

Performance Optimizations

Streaming Optimizations

  • Server-Sent Events: Real-time response streaming
  • Chunked Transfer: Efficient data streaming
  • Keep-Alive: Persistent connections for SSE
  • Low Latency: Direct OpenRouter integration

Future Optimizations

  • Response Caching: Cache for identical requests (coming soon)
  • Batch Processing: Multiple requests in parallel (coming soon)
  • Embeddings Cache: Store computed embeddings (coming soon)

API Endpoints

Chat Endpoints

MethodEndpointAuthDescription
POST/api/ai/chat/completionUserSend chat messages, supports streaming

Image Endpoints

MethodEndpointAuthDescription
POST/api/ai/image/generationUserGenerate images from text prompts

Configuration Endpoints

MethodEndpointAuthDescription
GET/api/ai/modelsAdminList available models
GET/api/ai/configurationsAdminList AI configurations
POST/api/ai/configurationsAdminCreate AI configuration
PATCH/api/ai/configurations/:idAdminUpdate AI configuration
DELETE/api/ai/configurations/:idAdminDelete AI configuration

Usage Tracking Endpoints

MethodEndpointAuthDescription
GET/api/ai/usageAdminGet usage records with pagination
GET/api/ai/usage/summaryAdminGet usage summary statistics
GET/api/ai/usage/config/:configIdAdminGet usage by configuration

Gateway Credentials & BYOK

InsForge resolves the OpenRouter API key using the following priority order:
PrioritySourceDescription
1BYOK (Bring Your Own Key)Developer-provided key, configured from the dashboard. Stored encrypted in the database.
2Cloud-managedAutomatically fetched from InsForge Cloud (cloud projects only).
3Environment variableOPENROUTER_API_KEY from the server environment (self-hosted fallback).
The first available source wins. For example, if a BYOK key is configured, it is always used regardless of whether a cloud-managed or environment key is also available.

Managing BYOK Keys

Admins can set, replace, or remove a BYOK key from the AI page in the dashboard under Gateway Credentials. The key is validated against OpenRouter before being saved. Removing a BYOK key reverts to the next available source (cloud-managed or environment variable).

Usage Tracking

Usage (token counts, image counts, model usage) is tracked regardless of which credential source is active. This gives admins a unified view of AI usage across all key sources. Note that when a BYOK key is active, billing goes directly to the developer’s OpenRouter account.

API Endpoints

MethodEndpointAuthDescription
GET/api/ai/gateway/configAdminGet current credential source and masked key status
POST/api/ai/gateway/configAdminSet or replace the BYOK OpenRouter key
DELETE/api/ai/gateway/configAdminRemove the BYOK key, reverting to the default source

Environment Variables

VariableDescriptionRequired
OPENROUTER_API_KEYOpenRouter API key (self-hosted fallback)Only if no BYOK or cloud-managed key
Note:
  • In cloud environments, the API key is fetched dynamically from the cloud API unless a BYOK key is configured
  • A BYOK key configured from the dashboard always takes precedence over other sources
  • Model configuration is stored in the database (_ai_configs table)
  • No other AI-related environment variables are used

Best Practices

Model Selection

Choose models based on speed vs quality needs

Prompt Engineering

Craft clear, specific prompts for better results

Token Management

Monitor token usage

Streaming UX

Use streaming for better perceived performance

Error Recovery

Implement retry logic for transient failures

Context Windows

Manage conversation history within limits

Comparison with Direct Integration

AspectInsForge + OpenRouterDirect Provider APIs
Integration EffortSingle APIMultiple integrations
BillingUnified through OpenRouterSeparate per provider
Model Access100+ modelsLimited to one provider
FailoverAutomaticManual implementation
Rate LimitingHandled by OpenRouterPer-provider limits

SDK References

Choose the AI SDK guide for your platform:

TypeScript

Web and Node.js applications with streaming chat and image generation

Swift

iOS, macOS, tvOS, and watchOS with async/await streaming support

Kotlin

Android applications with Flow-based streaming responses

Flutter

Cross-platform mobile apps with Stream-based AI interactions