Architecture - InsForge Docs

Overview

InsForge provides unified AI capabilities through OpenRouter, giving you access to multiple LLM providers with a single API and consistent pricing model.

Technology Stack

Core Components

Component	Technology	Purpose
AI Gateway	OpenRouter	Unified access to multiple AI providers
Chat Service	Node.js + SSE	Handle chat completions with streaming
Image Service	Async processing	Generate images via AI models
Configuration	PostgreSQL	Store system prompts per project
Usage Tracking	PostgreSQL	Monitor token usage
Response Format	JSON/SSE	Standard and streaming responses

OpenRouter Integration

Why OpenRouter?

Single API: One integration for multiple providers
Unified Billing: Consistent pricing across models
Automatic Failover: Fallback to alternative models
Rate Limiting: Built-in rate limit handling

Request Flow

Available Models

Chat Models

Provider	Model	ID	Best For
Anthropic	Claude Sonnet 4	`anthropic/claude-sonnet-4`	Complex reasoning
Anthropic	Claude 3.5 Haiku	`anthropic/claude-3.5-haiku`	Fast responses
Anthropic	Claude Opus 4.1	`anthropic/claude-opus-4.1`	Highest quality
OpenAI	GPT-5	`openai/gpt-5`	Most advanced
OpenAI	GPT-5 Mini	`openai/gpt-5-mini`	Fast & efficient
OpenAI	GPT-4o	`openai/gpt-4o`	General purpose
Google	Gemini 2.5 Pro	`google/gemini-2.5-pro`	Advanced reasoning

Image Models

Provider	Model	ID	Capabilities
Google	Gemini 2.5 Flash Image	`google/gemini-2.5-flash-image-preview`	Text-to-image generation

Chat Completions

Request Processing

Authentication: Verify JWT token
Configuration: Load project AI settings
System Prompt: Prepend configured prompt
Model Selection: Use specified or default model
OpenRouter Call: Forward to OpenRouter
Response Handling: Stream or batch response
Usage Tracking: Record token usage

Streaming Architecture

// Server-Sent Events (SSE) for streaming
async function* streamChat(messages, options) {
  const stream = await openRouter.chat.completions.create({
    model: options.model,
    messages: messages,
    stream: true,
    temperature: options.temperature,
    max_tokens: options.maxTokens
  });

  for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
      yield { chunk: chunk.choices[0].delta.content };
    }
  }
  
  yield { done: true, tokenUsage: {...} };
}

Response Formats

Non-streaming Response:

{
  "response": "AI generated response text",
  "model": "anthropic/claude-3.5-haiku",
  "usage": {
    "promptTokens": 150,
    "completionTokens": 200,
    "totalTokens": 350
  }
}

Streaming Response (SSE):

data: {"chunk": "The "}
data: {"chunk": "answer "}
data: {"chunk": "is..."}
data: {"done": true, "tokenUsage": {...}}

Image Generation

Generation Flow

Prompt Processing: Validate and enhance prompt
Model Selection: Choose appropriate image model
Size Configuration: Set dimensions and quality
OpenRouter Request: Send generation request
URL Generation: Receive image URLs
Storage Integration: Optional save to storage
Response Delivery: Return URLs to client

Image Parameters

Parameter	Options	Description
`model`	Model IDs	AI model to use
`prompt`	String	Text description
`size`	`512x512`, `1024x1024`, etc.	Image dimensions
`quality`	`standard`, `hd`	Image quality
`numImages`	1-4	Number of variations
`style`	`vivid`, `natural`	Style preference

Configuration Management

Database Schema

CREATE TABLE _ai_configs (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  modality VARCHAR(255) NOT NULL,     -- 'text' or 'image'
  provider VARCHAR(255) NOT NULL,      -- 'openrouter'
  model_id VARCHAR(255) UNIQUE NOT NULL,
  system_prompt TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

System Prompts

Configured per project
Applied to all chat requests
Cannot be overridden by client
Support for multiple configurations

Usage Tracking

Metrics Collected

CREATE TABLE _ai_usage (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  config_id UUID NOT NULL,
  input_tokens INT,
  output_tokens INT,
  image_count INT,
  image_resolution TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  FOREIGN KEY (config_id) REFERENCES _ai_configs(id) ON DELETE NO ACTION
);

Security & Rate Limiting

API Key Security

OpenRouter key stored server-side only

Request Validation

Input sanitization and size limits

Rate Limiting

Per-user and per-project limits

Usage Quotas

Configurable token limits

Content Filtering

Optional content moderation

Audit Logging

Track all AI operations

Error Handling

Error Types

Error	Code	Description
Model Not Found	400	Invalid model ID
Rate Limited	429	Too many requests
Token Limit	400	Exceeds max tokens
OpenRouter Error	502	Upstream provider issue
Quota Exceeded	402	Usage limit reached
Invalid Input	400	Malformed request

Retry Strategy

async function retryableRequest(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429) { // Rate limited
        await sleep(Math.pow(2, i) * 1000); // Exponential backoff
        continue;
      }
      throw error;
    }
  }
}

Performance Optimizations

Streaming Optimizations

Server-Sent Events: Real-time response streaming
Chunked Transfer: Efficient data streaming
Keep-Alive: Persistent connections for SSE
Low Latency: Direct OpenRouter integration

Future Optimizations

Response Caching: Cache for identical requests (coming soon)
Batch Processing: Multiple requests in parallel (coming soon)
Embeddings Cache: Store computed embeddings (coming soon)

API Endpoints

Chat Endpoints

Method	Endpoint	Auth	Description
POST	`/api/ai/chat/completion`	User	Send chat messages, supports streaming

Image Endpoints

Method	Endpoint	Auth	Description
POST	`/api/ai/image/generation`	User	Generate images from text prompts

Configuration Endpoints

Method	Endpoint	Auth	Description
GET	`/api/ai/models`	Admin	List available models
GET	`/api/ai/configurations`	Admin	List AI configurations
POST	`/api/ai/configurations`	Admin	Create AI configuration
PATCH	`/api/ai/configurations/:id`	Admin	Update AI configuration
DELETE	`/api/ai/configurations/:id`	Admin	Delete AI configuration

Usage Tracking Endpoints

Method	Endpoint	Auth	Description
GET	`/api/ai/usage`	Admin	Get usage records with pagination
GET	`/api/ai/usage/summary`	Admin	Get usage summary statistics
GET	`/api/ai/usage/config/:configId`	Admin	Get usage by configuration

Environment Variables

Variable	Description	Required
`OPENROUTER_API_KEY`	OpenRouter API key (local dev only)	Yes (local)

Note:

In cloud environments, the API key is fetched dynamically from the cloud API
Model configuration is stored in the database (_ai_configs table)
No other AI-related environment variables are used

Best Practices

Model Selection

Choose models based on speed vs quality needs

Prompt Engineering

Craft clear, specific prompts for better results

Token Management

Monitor token usage

Streaming UX

Use streaming for better perceived performance

Error Recovery

Implement retry logic for transient failures

Context Windows

Manage conversation history within limits

Comparison with Direct Integration

Aspect	InsForge + OpenRouter	Direct Provider APIs
Integration Effort	Single API	Multiple integrations
Billing	Unified through OpenRouter	Separate per provider
Model Access	100+ models	Limited to one provider
Failover	Automatic	Manual implementation
Rate Limiting	Handled by OpenRouter	Per-provider limits

Getting Started

Core Concepts

Changelog

​Overview

​Technology Stack

​Core Components

​OpenRouter Integration

​Why OpenRouter?

​Request Flow

​Available Models

​Chat Models

​Image Models

​Chat Completions

​Request Processing

​Streaming Architecture

​Response Formats

​Image Generation

​Generation Flow

​Image Parameters

​Configuration Management

​Database Schema

​System Prompts

​Usage Tracking

​Metrics Collected

​Security & Rate Limiting