Overview
InsForge provides unified AI capabilities through OpenRouter, giving you access to multiple LLM providers with a single API and consistent pricing model.Technology Stack
Core Components
Component | Technology | Purpose |
---|---|---|
AI Gateway | OpenRouter | Unified access to multiple AI providers |
Chat Service | Node.js + SSE | Handle chat completions with streaming |
Image Service | Async processing | Generate images via AI models |
Configuration | PostgreSQL | Store system prompts per project |
Usage Tracking | PostgreSQL | Monitor token usage |
Response Format | JSON/SSE | Standard and streaming responses |
OpenRouter Integration
Why OpenRouter?
- Single API: One integration for multiple providers
- Unified Billing: Consistent pricing across models
- Automatic Failover: Fallback to alternative models
- Rate Limiting: Built-in rate limit handling
Request Flow
Available Models
Chat Models
Provider | Model | ID | Best For |
---|---|---|---|
Anthropic | Claude Sonnet 4 | anthropic/claude-sonnet-4 | Complex reasoning |
Anthropic | Claude 3.5 Haiku | anthropic/claude-3.5-haiku | Fast responses |
Anthropic | Claude Opus 4.1 | anthropic/claude-opus-4.1 | Highest quality |
OpenAI | GPT-5 | openai/gpt-5 | Most advanced |
OpenAI | GPT-5 Mini | openai/gpt-5-mini | Fast & efficient |
OpenAI | GPT-4o | openai/gpt-4o | General purpose |
Gemini 2.5 Pro | google/gemini-2.5-pro | Advanced reasoning |
Image Models
Provider | Model | ID | Capabilities |
---|---|---|---|
Gemini 2.5 Flash Image | google/gemini-2.5-flash-image-preview | Text-to-image generation |
Chat Completions
Request Processing
- Authentication: Verify JWT token
- Configuration: Load project AI settings
- System Prompt: Prepend configured prompt
- Model Selection: Use specified or default model
- OpenRouter Call: Forward to OpenRouter
- Response Handling: Stream or batch response
- Usage Tracking: Record token usage
Streaming Architecture
Response Formats
Non-streaming Response:Image Generation
Generation Flow
- Prompt Processing: Validate and enhance prompt
- Model Selection: Choose appropriate image model
- Size Configuration: Set dimensions and quality
- OpenRouter Request: Send generation request
- URL Generation: Receive image URLs
- Storage Integration: Optional save to storage
- Response Delivery: Return URLs to client
Image Parameters
Parameter | Options | Description |
---|---|---|
model | Model IDs | AI model to use |
prompt | String | Text description |
size | 512x512 , 1024x1024 , etc. | Image dimensions |
quality | standard , hd | Image quality |
numImages | 1-4 | Number of variations |
style | vivid , natural | Style preference |
Configuration Management
Database Schema
System Prompts
- Configured per project
- Applied to all chat requests
- Cannot be overridden by client
- Support for multiple configurations
Usage Tracking
Metrics Collected
Security & Rate Limiting
API Key Security
OpenRouter key stored server-side only
Request Validation
Input sanitization and size limits
Rate Limiting
Per-user and per-project limits
Usage Quotas
Configurable token limits
Content Filtering
Optional content moderation
Audit Logging
Track all AI operations
Error Handling
Error Types
Error | Code | Description |
---|---|---|
Model Not Found | 400 | Invalid model ID |
Rate Limited | 429 | Too many requests |
Token Limit | 400 | Exceeds max tokens |
OpenRouter Error | 502 | Upstream provider issue |
Quota Exceeded | 402 | Usage limit reached |
Invalid Input | 400 | Malformed request |
Retry Strategy
Performance Optimizations
Streaming Optimizations
- Server-Sent Events: Real-time response streaming
- Chunked Transfer: Efficient data streaming
- Keep-Alive: Persistent connections for SSE
- Low Latency: Direct OpenRouter integration
Future Optimizations
- Response Caching: Cache for identical requests (coming soon)
- Batch Processing: Multiple requests in parallel (coming soon)
- Embeddings Cache: Store computed embeddings (coming soon)
API Endpoints
Chat Endpoints
Method | Endpoint | Auth | Description |
---|---|---|---|
POST | /api/ai/chat/completion | User | Send chat messages, supports streaming |
Image Endpoints
Method | Endpoint | Auth | Description |
---|---|---|---|
POST | /api/ai/image/generation | User | Generate images from text prompts |
Configuration Endpoints
Method | Endpoint | Auth | Description |
---|---|---|---|
GET | /api/ai/models | Admin | List available models |
GET | /api/ai/configurations | Admin | List AI configurations |
POST | /api/ai/configurations | Admin | Create AI configuration |
PATCH | /api/ai/configurations/:id | Admin | Update AI configuration |
DELETE | /api/ai/configurations/:id | Admin | Delete AI configuration |
Usage Tracking Endpoints
Method | Endpoint | Auth | Description |
---|---|---|---|
GET | /api/ai/usage | Admin | Get usage records with pagination |
GET | /api/ai/usage/summary | Admin | Get usage summary statistics |
GET | /api/ai/usage/config/:configId | Admin | Get usage by configuration |
Environment Variables
Variable | Description | Required |
---|---|---|
OPENROUTER_API_KEY | OpenRouter API key (local dev only) | Yes (local) |
- In cloud environments, the API key is fetched dynamically from the cloud API
- Model configuration is stored in the database (
_ai_configs
table) - No other AI-related environment variables are used
Best Practices
Model Selection
Choose models based on speed vs quality needs
Prompt Engineering
Craft clear, specific prompts for better results
Token Management
Monitor token usage
Streaming UX
Use streaming for better perceived performance
Error Recovery
Implement retry logic for transient failures
Context Windows
Manage conversation history within limits
Comparison with Direct Integration
Aspect | InsForge + OpenRouter | Direct Provider APIs |
---|---|---|
Integration Effort | Single API | Multiple integrations |
Billing | Unified through OpenRouter | Separate per provider |
Model Access | 100+ models | Limited to one provider |
Failover | Automatic | Manual implementation |
Rate Limiting | Handled by OpenRouter | Per-provider limits |