Model Hub

Host, share, and deploy AI models. Access community models or upload your own for browser-based inference with edge caching.

Overview

The Tenzro Model Hub provides:

  • Model Hosting: Upload and serve models globally
  • Edge Caching: CDN-distributed model files
  • Browser Inference: WebGPU/WebGL/WASM support
  • Version Control: Track model versions
  • Access Control: Public or private models

Supported Formats

FormatRuntimeUse Case
ONNXCortex Runtime (WebGPU/WebGL/WASM)General inference, best browser support
GGUFllama.cpp (WASM)Quantized LLM inference
SafeTensorsTransformers.jsHuggingFace models
TensorFlow.jsTensorFlow.jsTensorFlow models

Using Hub Models

import { Tenzro } from '@tenzro/cloud';
const client = new Tenzro({
apiKey: process.env.TENZRO_API_KEY,
});
// List available models
const models = await client.hub.listModels({
category: 'text-generation',
format: 'onnx',
});
for (const model of models.items) {
console.log(`${model.name} - ${model.size} - ${model.downloads} downloads`);
}
// Get model details
const model = await client.hub.getModel('tenzro/llama-3.2-1b-instruct-onnx');
console.log('Model:', model.name);
console.log('Format:', model.format);
console.log('Size:', model.size);
console.log('Description:', model.description);
// Download model
const download = await client.hub.downloadModel({
modelId: 'tenzro/llama-3.2-1b-instruct-onnx',
onProgress: (progress) => {
console.log(`Downloaded: ${progress.loaded}/${progress.total} bytes`);
},
});
console.log('Downloaded to:', download.path);

Browser Inference with Cortex Runtime

import { Tenzro } from '@tenzro/cloud';
const client = new Tenzro({
apiKey: process.env.TENZRO_API_KEY,
});
// Load ONNX model for browser inference
const model = await client.cortexRuntime.loadModel({
modelId: 'tenzro/phi-3-mini-onnx',
provider: 'webgpu', // WebGPU for best performance
options: {
cache: true, // Cache model in browser
},
});
// Run inference
const result = await client.cortexRuntime.run({
modelId: model.id,
inputs: {
prompt: 'Explain quantum computing in simple terms',
maxTokens: 100,
temperature: 0.7,
},
});
console.log('Output:', result.text);
// Unload model when done
await client.cortexRuntime.unloadModel(model.id);

Uploading Models

// Upload a custom model
const upload = await client.hub.uploadModel({
name: 'my-custom-model',
description: 'Fine-tuned model for customer support',
format: 'onnx',
category: 'text-generation',
files: [
{ path: 'model.onnx', data: modelBuffer },
{ path: 'tokenizer.json', data: tokenizerBuffer },
],
metadata: {
baseModel: 'llama-3.2-1b',
task: 'text-generation',
quantization: 'int8',
},
visibility: 'private', // or 'public'
});
console.log('Model uploaded:', upload.modelId);
console.log('Status:', upload.status);

Model Versioning

// Create a new version
await client.hub.createModelVersion({
modelId: 'my-org/my-model',
version: '1.1.0',
files: updatedFiles,
changelog: 'Improved accuracy on edge cases',
});
// List versions
const versions = await client.hub.listModelVersions('my-org/my-model');
for (const version of versions.items) {
console.log(`Version ${version.version}: ${version.changelog}`);
}
// Download specific version
const download = await client.hub.downloadModel({
modelId: 'my-org/my-model',
version: '1.0.0',
});

Access Control

// Update model visibility
await client.hub.updateModel('my-org/my-model', {
visibility: 'private',
});
// Grant access to specific users
await client.hub.grantModelAccess({
modelId: 'my-org/my-model',
users: ['user-id-1', 'user-id-2'],
organizations: ['partner-org-id'],
});
// Revoke access
await client.hub.revokeModelAccess({
modelId: 'my-org/my-model',
users: ['user-id-1'],
});
// List who has access
const access = await client.hub.getModelAccess('my-org/my-model');
console.log('Users with access:', access.users);
console.log('Organizations with access:', access.organizations);

Chunked Loading

Large models are automatically split for efficient loading:

// Load large model with chunking
const model = await client.cortexRuntime.loadModel({
modelId: 'tenzro/llama-3.2-3b-onnx',
provider: 'webgpu',
options: {
chunked: true,
chunkSize: 50 * 1024 * 1024, // 50MB chunks
},
onProgress: (progress) => {
console.log(`Chunk ${progress.chunk}/${progress.totalChunks}`);
console.log(`Progress: ${progress.percentage}%`);
},
});
console.log('Model loaded:', model.id);

Model Metadata

Add detailed metadata to your models:

await client.hub.updateModel('my-org/my-model', {
description: 'Fine-tuned model for customer support',
metadata: {
intendedUse: 'Customer service chatbots',
limitations: 'May not handle highly technical queries',
trainingData: 'Customer support transcripts (anonymized)',
baseModel: 'llama-3.2-1b',
quantization: 'int8',
evaluation: {
accuracy: 0.92,
f1Score: 0.89,
},
license: 'Apache-2.0',
},
tags: ['customer-support', 'chat', 'fine-tuned'],
});

Popular Models

ModelSizeTask
tenzro/phi-3-mini-onnx2.4 GBText Generation
tenzro/llama-3.2-1b-onnx1.2 GBText Generation
tenzro/whisper-small-onnx460 MBSpeech Recognition
tenzro/all-minilm-l6-v290 MBEmbeddings
tenzro/vit-base-patch16350 MBImage Classification

Pricing

FeatureFreePro
Public modelsUnlimitedUnlimited
Private models3Unlimited
Storage10 GB1 TB
Bandwidth100 GB/month10 TB/month