Model Hub
Host, share, and deploy AI models. Access community models or upload your own for browser-based inference with edge caching.
Overview
The Tenzro Model Hub provides:
- Model Hosting: Upload and serve models globally
- Edge Caching: CDN-distributed model files
- Browser Inference: WebGPU/WebGL/WASM support
- Version Control: Track model versions
- Access Control: Public or private models
Supported Formats
| Format | Runtime | Use Case |
|---|---|---|
| ONNX | ONNX Runtime Web | General inference |
| GGUF | llama.cpp (WASM) | LLM inference |
| SafeTensors | Transformers.js | HuggingFace models |
| MLC | WebLLM | Optimized LLMs |
Using Hub Models
import { Tenzro } from 'tenzro';const tenzro = new Tenzro();// List available modelsconst models = await tenzro.hub.list({category: 'text-generation',runtime: 'onnx',});// Get model detailsconst model = await tenzro.hub.get('tenzro/llama-3.2-1b-instruct-onnx');console.log(model.name, model.size, model.downloads);// Download modelconst downloaded = await tenzro.hub.download('tenzro/llama-3.2-1b-instruct-onnx', {progress: (percent) => console.log(`${percent}% downloaded`),});
Browser Inference
import { TenzroEdge } from '@tenzro/edge';const edge = new TenzroEdge({clientId: 'client_xxx',});// Load model for browser inferenceconst model = await edge.hub.load('tenzro/phi-3-mini-onnx', {runtime: 'webgpu', // or 'webgl', 'wasm'cache: true, // Cache in IndexedDB});// Run inferenceconst result = await model.generate({prompt: 'Explain quantum computing',maxTokens: 100,});console.log(result.text);
Uploading Models
// Upload a modelconst upload = await tenzro.hub.upload({name: 'my-custom-model',description: 'Fine-tuned model for customer support',format: 'onnx',files: [{ path: 'model.onnx', content: modelBuffer },{ path: 'tokenizer.json', content: tokenizerBuffer },],metadata: {baseModel: 'llama-3.2-1b',task: 'text-generation',quantization: 'int8',},visibility: 'private', // or 'public'});console.log('Model uploaded:', upload.id);
Model Versioning
// Create a new versionawait tenzro.hub.createVersion('my-org/my-model', {version: '1.1.0',files: updatedFiles,changelog: 'Improved accuracy on edge cases',});// List versionsconst versions = await tenzro.hub.listVersions('my-org/my-model');// Load specific versionconst model = await edge.hub.load('my-org/my-model@1.0.0');
Access Control
// Set model visibilityawait tenzro.hub.updateVisibility('my-org/my-model', {visibility: 'private',});// Grant access to specific users/organizationsawait tenzro.hub.grantAccess('my-org/my-model', {users: ['user-id-1', 'user-id-2'],organizations: ['partner-org'],});// Revoke accessawait tenzro.hub.revokeAccess('my-org/my-model', {users: ['user-id-1'],});
Chunked Loading
Large models are automatically split for efficient loading:
const model = await edge.hub.load('tenzro/llama-3.2-3b-onnx', {// Load only needed chunks (streaming inference)streaming: true,// Progress callbackonProgress: ({ loaded, total, chunk }) => {console.log(`Loading chunk ${chunk}: ${loaded}/${total}`);},});
Model Cards
Document your models with model cards:
await tenzro.hub.updateModelCard('my-org/my-model', {description: 'A fine-tuned model for customer support',intendedUse: 'Customer service chatbots',limitations: 'May not handle technical queries well',trainingData: 'Customer support transcripts (anonymized)',evaluation: {accuracy: 0.92,f1Score: 0.89,},license: 'Apache-2.0',});
Popular Models
| Model | Size | Task |
|---|---|---|
| tenzro/phi-3-mini-onnx | 2.4 GB | Text Generation |
| tenzro/llama-3.2-1b-onnx | 1.2 GB | Text Generation |
| tenzro/whisper-small-onnx | 460 MB | Speech Recognition |
| tenzro/all-minilm-l6-v2 | 90 MB | Embeddings |
| tenzro/vit-base-patch16 | 350 MB | Image Classification |
Pricing
| Feature | Free | Pro |
|---|---|---|
| Public models | Unlimited | Unlimited |
| Private models | 3 | Unlimited |
| Storage | 10 GB | 1 TB |
| Bandwidth | 100 GB/month | 10 TB/month |