Overview

InsForge provides unified AI capabilities through OpenRouter, giving you access to multiple LLM providers with a single API and consistent pricing model.

Technology Stack

Core Components

ComponentTechnologyPurpose
AI GatewayOpenRouterUnified access to multiple AI providers
Chat ServiceNode.js + SSEHandle chat completions with streaming
Image ServiceAsync processingGenerate images via AI models
ConfigurationPostgreSQLStore system prompts per project
Usage TrackingPostgreSQLMonitor token usage
Response FormatJSON/SSEStandard and streaming responses

OpenRouter Integration

Why OpenRouter?

  • Single API: One integration for multiple providers
  • Unified Billing: Consistent pricing across models
  • Automatic Failover: Fallback to alternative models
  • Rate Limiting: Built-in rate limit handling

Request Flow

Available Models

Chat Models

ProviderModelIDBest For
AnthropicClaude Sonnet 4anthropic/claude-sonnet-4Complex reasoning
AnthropicClaude 3.5 Haikuanthropic/claude-3.5-haikuFast responses
AnthropicClaude Opus 4.1anthropic/claude-opus-4.1Highest quality
OpenAIGPT-5openai/gpt-5Most advanced
OpenAIGPT-5 Miniopenai/gpt-5-miniFast & efficient
OpenAIGPT-4oopenai/gpt-4oGeneral purpose
GoogleGemini 2.5 Progoogle/gemini-2.5-proAdvanced reasoning

Image Models

ProviderModelIDCapabilities
GoogleGemini 2.5 Flash Imagegoogle/gemini-2.5-flash-image-previewText-to-image generation

Chat Completions

Request Processing

  1. Authentication: Verify JWT token
  2. Configuration: Load project AI settings
  3. System Prompt: Prepend configured prompt
  4. Model Selection: Use specified or default model
  5. OpenRouter Call: Forward to OpenRouter
  6. Response Handling: Stream or batch response
  7. Usage Tracking: Record token usage

Streaming Architecture

// Server-Sent Events (SSE) for streaming
async function* streamChat(messages, options) {
  const stream = await openRouter.chat.completions.create({
    model: options.model,
    messages: messages,
    stream: true,
    temperature: options.temperature,
    max_tokens: options.maxTokens
  });

  for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
      yield { chunk: chunk.choices[0].delta.content };
    }
  }
  
  yield { done: true, tokenUsage: {...} };
}

Response Formats

Non-streaming Response:
{
  "response": "AI generated response text",
  "model": "anthropic/claude-3.5-haiku",
  "usage": {
    "promptTokens": 150,
    "completionTokens": 200,
    "totalTokens": 350
  }
}
Streaming Response (SSE):
data: {"chunk": "The "}
data: {"chunk": "answer "}
data: {"chunk": "is..."}
data: {"done": true, "tokenUsage": {...}}

Image Generation

Generation Flow

  1. Prompt Processing: Validate and enhance prompt
  2. Model Selection: Choose appropriate image model
  3. Size Configuration: Set dimensions and quality
  4. OpenRouter Request: Send generation request
  5. URL Generation: Receive image URLs
  6. Storage Integration: Optional save to storage
  7. Response Delivery: Return URLs to client

Image Parameters

ParameterOptionsDescription
modelModel IDsAI model to use
promptStringText description
size512x512, 1024x1024, etc.Image dimensions
qualitystandard, hdImage quality
numImages1-4Number of variations
stylevivid, naturalStyle preference

Configuration Management

Database Schema

CREATE TABLE _ai_configs (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  modality VARCHAR(255) NOT NULL,     -- 'text' or 'image'
  provider VARCHAR(255) NOT NULL,      -- 'openrouter'
  model_id VARCHAR(255) UNIQUE NOT NULL,
  system_prompt TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

System Prompts

  • Configured per project
  • Applied to all chat requests
  • Cannot be overridden by client
  • Support for multiple configurations

Usage Tracking

Metrics Collected

CREATE TABLE _ai_usage (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  config_id UUID NOT NULL,
  input_tokens INT,
  output_tokens INT,
  image_count INT,
  image_resolution TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  FOREIGN KEY (config_id) REFERENCES _ai_configs(id) ON DELETE NO ACTION
);

Security & Rate Limiting

API Key Security

OpenRouter key stored server-side only

Request Validation

Input sanitization and size limits

Rate Limiting

Per-user and per-project limits

Usage Quotas

Configurable token limits

Content Filtering

Optional content moderation

Audit Logging

Track all AI operations

Error Handling

Error Types

ErrorCodeDescription
Model Not Found400Invalid model ID
Rate Limited429Too many requests
Token Limit400Exceeds max tokens
OpenRouter Error502Upstream provider issue
Quota Exceeded402Usage limit reached
Invalid Input400Malformed request

Retry Strategy

async function retryableRequest(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429) { // Rate limited
        await sleep(Math.pow(2, i) * 1000); // Exponential backoff
        continue;
      }
      throw error;
    }
  }
}

Performance Optimizations

Streaming Optimizations

  • Server-Sent Events: Real-time response streaming
  • Chunked Transfer: Efficient data streaming
  • Keep-Alive: Persistent connections for SSE
  • Low Latency: Direct OpenRouter integration

Future Optimizations

  • Response Caching: Cache for identical requests (coming soon)
  • Batch Processing: Multiple requests in parallel (coming soon)
  • Embeddings Cache: Store computed embeddings (coming soon)

API Endpoints

Chat Endpoints

MethodEndpointAuthDescription
POST/api/ai/chat/completionUserSend chat messages, supports streaming

Image Endpoints

MethodEndpointAuthDescription
POST/api/ai/image/generationUserGenerate images from text prompts

Configuration Endpoints

MethodEndpointAuthDescription
GET/api/ai/modelsAdminList available models
GET/api/ai/configurationsAdminList AI configurations
POST/api/ai/configurationsAdminCreate AI configuration
PATCH/api/ai/configurations/:idAdminUpdate AI configuration
DELETE/api/ai/configurations/:idAdminDelete AI configuration

Usage Tracking Endpoints

MethodEndpointAuthDescription
GET/api/ai/usageAdminGet usage records with pagination
GET/api/ai/usage/summaryAdminGet usage summary statistics
GET/api/ai/usage/config/:configIdAdminGet usage by configuration

Environment Variables

VariableDescriptionRequired
OPENROUTER_API_KEYOpenRouter API key (local dev only)Yes (local)
Note:
  • In cloud environments, the API key is fetched dynamically from the cloud API
  • Model configuration is stored in the database (_ai_configs table)
  • No other AI-related environment variables are used

Best Practices

Model Selection

Choose models based on speed vs quality needs

Prompt Engineering

Craft clear, specific prompts for better results

Token Management

Monitor token usage

Streaming UX

Use streaming for better perceived performance

Error Recovery

Implement retry logic for transient failures

Context Windows

Manage conversation history within limits

Comparison with Direct Integration

AspectInsForge + OpenRouterDirect Provider APIs
Integration EffortSingle APIMultiple integrations
BillingUnified through OpenRouterSeparate per provider
Model Access100+ modelsLimited to one provider
FailoverAutomaticManual implementation
Rate LimitingHandled by OpenRouterPer-provider limits