Model Hub

Host, share, and deploy AI models. Access community models or upload your own for browser-based inference with edge caching.

Overview

The Tenzro Model Hub provides:

Model Hosting: Upload and serve models globally
Edge Caching: CDN-distributed model files
Browser Inference: WebGPU/WebGL/WASM support
Version Control: Track model versions
Access Control: Public or private models

Supported Formats

Format	Runtime	Use Case
ONNX	Cortex Runtime (WebGPU/WebGL/WASM)	General inference, best browser support
GGUF	llama.cpp (WASM)	Quantized LLM inference
SafeTensors	Transformers.js	HuggingFace models
TensorFlow.js	TensorFlow.js	TensorFlow models

Using Hub Models

import { Tenzro } from '@tenzro/cloud';

const client = new Tenzro({
  apiKey: process.env.TENZRO_API_KEY,
});

// List available models
const models = await client.hub.listModels({
  category: 'text-generation',
  format: 'onnx',
});

for (const model of models.items) {
  console.log(`${model.name} - ${model.size} - ${model.downloads} downloads`);
}

// Get model details
const model = await client.hub.getModel('tenzro/llama-3.2-1b-instruct-onnx');
console.log('Model:', model.name);
console.log('Format:', model.format);
console.log('Size:', model.size);
console.log('Description:', model.description);

// Download model
const download = await client.hub.downloadModel({
  modelId: 'tenzro/llama-3.2-1b-instruct-onnx',
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.loaded}/${progress.total} bytes`);
  },
});

console.log('Downloaded to:', download.path);

Browser Inference with Cortex Runtime

import { Tenzro } from '@tenzro/cloud';

const client = new Tenzro({
  apiKey: process.env.TENZRO_API_KEY,
});

// Load ONNX model for browser inference
const model = await client.cortexRuntime.loadModel({
  modelId: 'tenzro/phi-3-mini-onnx',
  provider: 'webgpu', // WebGPU for best performance
  options: {
    cache: true, // Cache model in browser
  },
});

// Run inference
const result = await client.cortexRuntime.run({
  modelId: model.id,
  inputs: {
    prompt: 'Explain quantum computing in simple terms',
    maxTokens: 100,
    temperature: 0.7,
  },
});

console.log('Output:', result.text);

// Unload model when done
await client.cortexRuntime.unloadModel(model.id);

Uploading Models

// Upload a custom model
const upload = await client.hub.uploadModel({
  name: 'my-custom-model',
  description: 'Fine-tuned model for customer support',
  format: 'onnx',
  category: 'text-generation',
  files: [
    { path: 'model.onnx', data: modelBuffer },
    { path: 'tokenizer.json', data: tokenizerBuffer },
  ],
  metadata: {
    baseModel: 'llama-3.2-1b',
    task: 'text-generation',
    quantization: 'int8',
  },
  visibility: 'private', // or 'public'
});

console.log('Model uploaded:', upload.modelId);
console.log('Status:', upload.status);

Model Versioning

// Create a new version
await client.hub.createModelVersion({
  modelId: 'my-org/my-model',
  version: '1.1.0',
  files: updatedFiles,
  changelog: 'Improved accuracy on edge cases',
});

// List versions
const versions = await client.hub.listModelVersions('my-org/my-model');

for (const version of versions.items) {
  console.log(`Version ${version.version}: ${version.changelog}`);
}

// Download specific version
const download = await client.hub.downloadModel({
  modelId: 'my-org/my-model',
  version: '1.0.0',
});

Access Control

// Update model visibility
await client.hub.updateModel('my-org/my-model', {
  visibility: 'private',
});

// Grant access to specific users
await client.hub.grantModelAccess({
  modelId: 'my-org/my-model',
  users: ['user-id-1', 'user-id-2'],
  organizations: ['partner-org-id'],
});

// Revoke access
await client.hub.revokeModelAccess({
  modelId: 'my-org/my-model',
  users: ['user-id-1'],
});

// List who has access
const access = await client.hub.getModelAccess('my-org/my-model');
console.log('Users with access:', access.users);
console.log('Organizations with access:', access.organizations);

Chunked Loading

Large models are automatically split for efficient loading:

// Load large model with chunking
const model = await client.cortexRuntime.loadModel({
  modelId: 'tenzro/llama-3.2-3b-onnx',
  provider: 'webgpu',
  options: {
    chunked: true,
    chunkSize: 50 * 1024 * 1024, // 50MB chunks
  },
  onProgress: (progress) => {
    console.log(`Chunk ${progress.chunk}/${progress.totalChunks}`);
    console.log(`Progress: ${progress.percentage}%`);
  },
});

console.log('Model loaded:', model.id);

Model Metadata

Add detailed metadata to your models:

await client.hub.updateModel('my-org/my-model', {
  description: 'Fine-tuned model for customer support',
  metadata: {
    intendedUse: 'Customer service chatbots',
    limitations: 'May not handle highly technical queries',
    trainingData: 'Customer support transcripts (anonymized)',
    baseModel: 'llama-3.2-1b',
    quantization: 'int8',
    evaluation: {
      accuracy: 0.92,
      f1Score: 0.89,
    },
    license: 'Apache-2.0',
  },
  tags: ['customer-support', 'chat', 'fine-tuned'],
});

Popular Models

Model	Size	Task
tenzro/phi-3-mini-onnx	2.4 GB	Text Generation
tenzro/llama-3.2-1b-onnx	1.2 GB	Text Generation
tenzro/whisper-small-onnx	460 MB	Speech Recognition
tenzro/all-minilm-l6-v2	90 MB	Embeddings
tenzro/vit-base-patch16	350 MB	Image Classification

Pricing

Feature	Free	Pro
Public models	Unlimited	Unlimited
Private models	3	Unlimited
Storage	10 GB	1 TB
Bandwidth	100 GB/month	10 TB/month