Back to White Papers

API Integration Guide

Connect Your Infrastructure to All Major LLM Providers Including OpenRouter

Download Complete Integration Guide

Get step-by-step code examples, SDK configurations, and deployment scripts for all providers

Download PDF - Free

Table of Contents

  1. Introduction to Multi-Provider LLM Architecture
  2. OpenRouter Integration (Unified API)
  3. Direct Provider Integrations
  4. Apple Silicon Optimization Techniques
  5. Production Deployment Checklist

1. Introduction to Multi-Provider Architecture

Modern LLM applications benefit from using multiple providers for:

  • Cost Optimization: Route requests to the most cost-effective model for each task
  • Reliability: Automatic failover when providers experience downtime
  • Specialization: Use the best model for specific tasks (e.g., code, math, reasoning)
  • Compliance: Route sensitive data to on-premise or dedicated infrastructure

5gb.com's Role: Use our dedicated Apple Silicon instances for cost-sensitive, high-volume workloads while maintaining 100% data privacy. Route to cloud providers for specialized models not available locally.

2. OpenRouter Integration (Recommended)

What is OpenRouter?

OpenRouter provides a unified API interface to 500+ models from 60+ providers. Instead of managing multiple API keys and endpoints, use one API for everything.

Step 1: Get Your OpenRouter API Key

Sign up at openrouter.ai and generate an API key from your dashboard.

Step 2: Basic OpenRouter Integration

// JavaScript/Node.js Example const openrouter = new OpenAI({ apiKey: process.env.OPENROUTER_API_KEY, baseURL: "https://openrouter.ai/api/v1", }); // Using auto-model selection (OpenRouter picks best model) async function generateWithAuto() { const completion = await openrouter.chat.completions.create({ model: "openrouter/auto", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Explain quantum computing" } ], }); console.log(completion.choices[0].message.content); } // Using a specific model async function generateWithSpecific() { const completion = await openrouter.chat.completions.create({ model: "openai/gpt-4-turbo", messages: [ { role: "user", content: "Write a Python function" } ], }); }

Step 3: Cost Optimization with Model Routing

// Smart routing based on task type const router = { simple_tasks: "meta-llama/llama-3.3-70b-instruct:free", reasoning: "anthropic/claude-3-sonnet", code: "openai/gpt-4-turbo", math: "deepseek/deepseek-reasoner", }; async function routeRequest(task, message) { const model = router[task] || "openrouter/auto"; return await openrouter.chat.completions.create({ model: model, messages: [{ role: "user", content: message }] }); }

Step 4: Connect 5gb.com to OpenRouter

Use your 5gb.com dedicated instances as OpenRouter providers:

# Python integration with local 5gb.com instance import openai # Configure OpenAI SDK to use your 5gb.com endpoint client = openai.OpenAI( base_url="https://your-instance.5gb.com/v1", api_key="your-5gb-api-key" ) # Now all requests go through your dedicated Apple Silicon response = client.chat.completions.create( model="llama-3.3-70b", messages=[{"role": "user", "content": "Your prompt"}] )

5gb.com + OpenRouter Benefits

  • Data Privacy: Sensitive requests route to your dedicated 5gb.com instance
  • Cost Control: Use OpenRouter's free tier models for testing, 5gb.com for production
  • Unlimited Scale: 5gb.com handles high-volume, OpenRouter provides model diversity
  • No Lock-in: All using standard OpenAI-compatible APIs

3. Direct Provider Integrations

OpenAI Integration

// JavaScript const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, }); const response = await openai.chat.completions.create({ model: "gpt-4-turbo", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Hello!" } ], temperature: 0.7, max_tokens: 1000, });

Anthropic Claude Integration

// JavaScript const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY, }); const message = await anthropic.messages.create({ model: "claude-3-opus-20240229", max_tokens: 1024, messages: [ { role: "user", content: "Hello, Claude" } ], });

Google Gemini Integration

// JavaScript const { GoogleGenerativeAI } = require("@google/generative-ai"); const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY); const model = genAI.getGenerativeModel({ model: "gemini-1.5-pro" }); const result = await model.generateContent("Explain quantum computing");

DeepSeek Integration

// Python from openai import OpenAI client = OpenAI( api_key="your-deepseek-key", base_url="https://api.deepseek.com/v1", ) response = client.chat.completions.create( model="deepseek-chat", messages=[ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "Hello!"} ] )

4. Apple Silicon Optimization on 5gb.com

Install MLX for Apple Silicon

# SSH into your 5gb.com instance ssh user@your-instance.5gb.com # Install MLX (Apple's machine learning framework) pip install mlx mlx-lm # Install vLLM with Metal backend pip install vllm # Or install Ollama for easy model management curl -fsSL https://ollama.ai/install.sh | sh

Deploy Optimized Model

# Using vLLM with Apple Silicon optimizations python -m vllm.entrypoints.openai.api_server \ --model meta-llama/Llama-3.3-70B-Instruct \ --host 0.0.0.0 \ --port 8000 \ --tensor-parallel-size 1 \ --gpu-memory-utilization 0.8 \ --enable-chunked-prefill # Your API is now available at: # http://your-instance.5gb.com:8000/v1/chat/completions

Client Configuration

// Connect to your optimized 5gb.com instance const client = new OpenAI({ baseURL: "https://your-instance.5gb.com:8000/v1", apiKey: "not-required-for-local" }); // This will use your Apple Silicon-optimized model const response = await client.chat.completions.create({ model: "meta-llama/Llama-3.3-70B-Instruct", messages: [{ role: "user", content: "Your prompt" }] });

5. Production Deployment Checklist

Security

  • ✓ Use HTTPS for all API endpoints
  • ✓ Implement API key rotation
  • ✓ Enable request rate limiting
  • ✓ Set up VPC/private networks
  • ✓ Use environment variables for secrets

Performance

  • ✓ Implement connection pooling
  • ✓ Use streaming for long responses
  • ✓ Cache repeated requests
  • ✓ Implement circuit breakers
  • ✓ Monitor latency and throughput

Cost Management

  • ✓ Set token limits per request
  • ✓ Implement request budgets
  • ✓ Use auto-model selection for cost optimization
  • ✓ Route high-volume to dedicated 5gb.com instances
  • ✓ Track cost per feature/user

Reliability

  • ✓ Implement fallback providers
  • ✓ Use health checks and monitoring
  • ✓ Set up alerting for failures
  • ✓ Test provider switching
  • ✓ Maintain provider diversity

Complete Example: Multi-Provider Setup

// Complete production-ready integration class LLMProvider { constructor() { this.providers = { local: new OpenAI({ baseURL: process.env.LOCAL_5gb_ENDPOINT }), openai: new OpenAI({ apiKey: process.env.OPENAI_KEY }), anthropic: new Anthropic({ apiKey: process.env.ANTHROPIC_KEY }), router: new OpenAI({ apiKey: process.env.OPENROUTER_KEY, baseURL: "https://openrouter.ai/api/v1" }) }; } async generate(prompt, options = {}) { const { sensitivity = "low", // "high" = use local 5gb.com, "low" = any provider task = "general", // "code", "reasoning", "general" budget = "unlimited" // "low", "medium", "unlimited" } = options; // Route based on sensitivity if (sensitivity === "high") { // Use dedicated 5gb.com for sensitive data return await this.providers.local.chat.completions.create({ model: "llama-3.3-70b", messages: [{ role: "user", content: prompt }] }); } // Route based on task const modelRouter = { code: "openai/gpt-4-turbo", reasoning: "anthropic/claude-3-opus", general: "router/auto" }; return await this.providers.router.chat.completions.create({ model: modelRouter[task] || "router/auto", messages: [{ role: "user", content: prompt }] }); } }

Get the Complete Integration Guide

Download the full 60-page guide with 50+ code examples, deployment scripts, and best practices

Download PDF - Free