API Integration Guide: Connecting to All LLM Providers

1. Introduction to Multi-Provider Architecture

Modern LLM applications benefit from using multiple providers for:

Cost Optimization: Route requests to the most cost-effective model for each task
Reliability: Automatic failover when providers experience downtime
Specialization: Use the best model for specific tasks (e.g., code, math, reasoning)
Compliance: Route sensitive data to on-premise or dedicated infrastructure

5gb.com's Role: Use our dedicated Apple Silicon instances for cost-sensitive, high-volume workloads while maintaining 100% data privacy. Route to cloud providers for specialized models not available locally.

2. OpenRouter Integration (Recommended)

What is OpenRouter?

OpenRouter provides a unified API interface to 500+ models from 60+ providers. Instead of managing multiple API keys and endpoints, use one API for everything.

Step 1: Get Your OpenRouter API Key

Sign up at openrouter.ai and generate an API key from your dashboard.

Step 2: Basic OpenRouter Integration

// JavaScript/Node.js Example
const openrouter = new OpenAI({
  apiKey: process.env.OPENROUTER_API_KEY,
  baseURL: "https://openrouter.ai/api/v1",
});

// Using auto-model selection (OpenRouter picks best model)
async function generateWithAuto() {
  const completion = await openrouter.chat.completions.create({
    model: "openrouter/auto",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "Explain quantum computing" }
    ],
  });
  console.log(completion.choices[0].message.content);
}

// Using a specific model
async function generateWithSpecific() {
  const completion = await openrouter.chat.completions.create({
    model: "openai/gpt-4-turbo",
    messages: [
      { role: "user", content: "Write a Python function" }
    ],
  });
}
                    

Step 3: Cost Optimization with Model Routing

// Smart routing based on task type
const router = {
  simple_tasks: "meta-llama/llama-3.3-70b-instruct:free",
  reasoning: "anthropic/claude-3-sonnet",
  code: "openai/gpt-4-turbo",
  math: "deepseek/deepseek-reasoner",
};

async function routeRequest(task, message) {
  const model = router[task] || "openrouter/auto";
  return await openrouter.chat.completions.create({
    model: model,
    messages: [{ role: "user", content: message }]
  });
}
                    

Step 4: Connect 5gb.com to OpenRouter

Use your 5gb.com dedicated instances as OpenRouter providers:

# Python integration with local 5gb.com instance
import openai

# Configure OpenAI SDK to use your 5gb.com endpoint
client = openai.OpenAI(
    base_url="https://your-instance.5gb.com/v1",
    api_key="your-5gb-api-key"
)

# Now all requests go through your dedicated Apple Silicon
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Your prompt"}]
)
                    

5gb.com + OpenRouter Benefits

Data Privacy: Sensitive requests route to your dedicated 5gb.com instance
Cost Control: Use OpenRouter's free tier models for testing, 5gb.com for production
Unlimited Scale: 5gb.com handles high-volume, OpenRouter provides model diversity
No Lock-in: All using standard OpenAI-compatible APIs

3. Direct Provider Integrations

OpenAI Integration

// JavaScript
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const response = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello!" }
  ],
  temperature: 0.7,
  max_tokens: 1000,
});
                

Anthropic Claude Integration

// JavaScript
const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const message = await anthropic.messages.create({
  model: "claude-3-opus-20240229",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "Hello, Claude" }
  ],
});
                

Google Gemini Integration

// JavaScript
const { GoogleGenerativeAI } = require("@google/generative-ai");

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-pro" });

const result = await model.generateContent("Explain quantum computing");
                

DeepSeek Integration

// Python
from openai import OpenAI

client = OpenAI(
    api_key="your-deepseek-key",
    base_url="https://api.deepseek.com/v1",
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Hello!"}
    ]
)
                

4. Apple Silicon Optimization on 5gb.com

Install MLX for Apple Silicon

# SSH into your 5gb.com instance
ssh user@your-instance.5gb.com

# Install MLX (Apple's machine learning framework)
pip install mlx mlx-lm

# Install vLLM with Metal backend
pip install vllm

# Or install Ollama for easy model management
curl -fsSL https://ollama.ai/install.sh | sh
                    

Deploy Optimized Model

# Using vLLM with Apple Silicon optimizations
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.3-70B-Instruct \
  --host 0.0.0.0 \
  --port 8000 \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.8 \
  --enable-chunked-prefill

# Your API is now available at:
# http://your-instance.5gb.com:8000/v1/chat/completions
                    

Client Configuration

// Connect to your optimized 5gb.com instance
const client = new OpenAI({
  baseURL: "https://your-instance.5gb.com:8000/v1",
  apiKey: "not-required-for-local"
});

// This will use your Apple Silicon-optimized model
const response = await client.chat.completions.create({
  model: "meta-llama/Llama-3.3-70B-Instruct",
  messages: [{ role: "user", content: "Your prompt" }]
});
                    

5. Production Deployment Checklist

Security

✓ Use HTTPS for all API endpoints
✓ Implement API key rotation
✓ Enable request rate limiting
✓ Set up VPC/private networks
✓ Use environment variables for secrets

Performance

✓ Implement connection pooling
✓ Use streaming for long responses
✓ Cache repeated requests
✓ Implement circuit breakers
✓ Monitor latency and throughput

Cost Management

✓ Set token limits per request
✓ Implement request budgets
✓ Use auto-model selection for cost optimization
✓ Route high-volume to dedicated 5gb.com instances
✓ Track cost per feature/user

Reliability

✓ Implement fallback providers
✓ Use health checks and monitoring
✓ Set up alerting for failures
✓ Test provider switching
✓ Maintain provider diversity

Complete Example: Multi-Provider Setup

// Complete production-ready integration
class LLMProvider {
  constructor() {
    this.providers = {
      local: new OpenAI({ baseURL: process.env.LOCAL_5gb_ENDPOINT }),
      openai: new OpenAI({ apiKey: process.env.OPENAI_KEY }),
      anthropic: new Anthropic({ apiKey: process.env.ANTHROPIC_KEY }),
      router: new OpenAI({
        apiKey: process.env.OPENROUTER_KEY,
        baseURL: "https://openrouter.ai/api/v1"
      })
    };
  }

  async generate(prompt, options = {}) {
    const {
      sensitivity = "low",     // "high" = use local 5gb.com, "low" = any provider
      task = "general",        // "code", "reasoning", "general"
      budget = "unlimited"     // "low", "medium", "unlimited"
    } = options;

    // Route based on sensitivity
    if (sensitivity === "high") {
      // Use dedicated 5gb.com for sensitive data
      return await this.providers.local.chat.completions.create({
        model: "llama-3.3-70b",
        messages: [{ role: "user", content: prompt }]
      });
    }

    // Route based on task
    const modelRouter = {
      code: "openai/gpt-4-turbo",
      reasoning: "anthropic/claude-3-opus",
      general: "router/auto"
    };

    return await this.providers.router.chat.completions.create({
      model: modelRouter[task] || "router/auto",
      messages: [{ role: "user", content: prompt }]
    });
  }
}
                

API Integration Guide

Download Complete Integration Guide

Table of Contents