Model Hub
Host, share, and deploy AI models. Access community models or upload your own for browser-based inference with edge caching.
Overview
The Tenzro Model Hub provides:
- Model Hosting: Upload and serve models globally
- Edge Caching: CDN-distributed model files
- Browser Inference: WebGPU/WebGL/WASM support
- Version Control: Track model versions
- Access Control: Public or private models
Supported Formats
| Format | Runtime | Use Case |
|---|---|---|
| ONNX | Cortex Runtime (WebGPU/WebGL/WASM) | General inference, best browser support |
| GGUF | llama.cpp (WASM) | Quantized LLM inference |
| SafeTensors | Transformers.js | HuggingFace models |
| TensorFlow.js | TensorFlow.js | TensorFlow models |
Using Hub Models
import { Tenzro } from '@tenzro/cloud';const client = new Tenzro({apiKey: process.env.TENZRO_API_KEY,});// List available modelsconst models = await client.hub.listModels({category: 'text-generation',format: 'onnx',});for (const model of models.items) {console.log(`${model.name} - ${model.size} - ${model.downloads} downloads`);}// Get model detailsconst model = await client.hub.getModel('tenzro/llama-3.2-1b-instruct-onnx');console.log('Model:', model.name);console.log('Format:', model.format);console.log('Size:', model.size);console.log('Description:', model.description);// Download modelconst download = await client.hub.downloadModel({modelId: 'tenzro/llama-3.2-1b-instruct-onnx',onProgress: (progress) => {console.log(`Downloaded: ${progress.loaded}/${progress.total} bytes`);},});console.log('Downloaded to:', download.path);
Browser Inference with Cortex Runtime
import { Tenzro } from '@tenzro/cloud';const client = new Tenzro({apiKey: process.env.TENZRO_API_KEY,});// Load ONNX model for browser inferenceconst model = await client.cortexRuntime.loadModel({modelId: 'tenzro/phi-3-mini-onnx',provider: 'webgpu', // WebGPU for best performanceoptions: {cache: true, // Cache model in browser},});// Run inferenceconst result = await client.cortexRuntime.run({modelId: model.id,inputs: {prompt: 'Explain quantum computing in simple terms',maxTokens: 100,temperature: 0.7,},});console.log('Output:', result.text);// Unload model when doneawait client.cortexRuntime.unloadModel(model.id);
Uploading Models
// Upload a custom modelconst upload = await client.hub.uploadModel({name: 'my-custom-model',description: 'Fine-tuned model for customer support',format: 'onnx',category: 'text-generation',files: [{ path: 'model.onnx', data: modelBuffer },{ path: 'tokenizer.json', data: tokenizerBuffer },],metadata: {baseModel: 'llama-3.2-1b',task: 'text-generation',quantization: 'int8',},visibility: 'private', // or 'public'});console.log('Model uploaded:', upload.modelId);console.log('Status:', upload.status);
Model Versioning
// Create a new versionawait client.hub.createModelVersion({modelId: 'my-org/my-model',version: '1.1.0',files: updatedFiles,changelog: 'Improved accuracy on edge cases',});// List versionsconst versions = await client.hub.listModelVersions('my-org/my-model');for (const version of versions.items) {console.log(`Version ${version.version}: ${version.changelog}`);}// Download specific versionconst download = await client.hub.downloadModel({modelId: 'my-org/my-model',version: '1.0.0',});
Access Control
// Update model visibilityawait client.hub.updateModel('my-org/my-model', {visibility: 'private',});// Grant access to specific usersawait client.hub.grantModelAccess({modelId: 'my-org/my-model',users: ['user-id-1', 'user-id-2'],organizations: ['partner-org-id'],});// Revoke accessawait client.hub.revokeModelAccess({modelId: 'my-org/my-model',users: ['user-id-1'],});// List who has accessconst access = await client.hub.getModelAccess('my-org/my-model');console.log('Users with access:', access.users);console.log('Organizations with access:', access.organizations);
Chunked Loading
Large models are automatically split for efficient loading:
// Load large model with chunkingconst model = await client.cortexRuntime.loadModel({modelId: 'tenzro/llama-3.2-3b-onnx',provider: 'webgpu',options: {chunked: true,chunkSize: 50 * 1024 * 1024, // 50MB chunks},onProgress: (progress) => {console.log(`Chunk ${progress.chunk}/${progress.totalChunks}`);console.log(`Progress: ${progress.percentage}%`);},});console.log('Model loaded:', model.id);
Model Metadata
Add detailed metadata to your models:
await client.hub.updateModel('my-org/my-model', {description: 'Fine-tuned model for customer support',metadata: {intendedUse: 'Customer service chatbots',limitations: 'May not handle highly technical queries',trainingData: 'Customer support transcripts (anonymized)',baseModel: 'llama-3.2-1b',quantization: 'int8',evaluation: {accuracy: 0.92,f1Score: 0.89,},license: 'Apache-2.0',},tags: ['customer-support', 'chat', 'fine-tuned'],});
Popular Models
| Model | Size | Task |
|---|---|---|
| tenzro/phi-3-mini-onnx | 2.4 GB | Text Generation |
| tenzro/llama-3.2-1b-onnx | 1.2 GB | Text Generation |
| tenzro/whisper-small-onnx | 460 MB | Speech Recognition |
| tenzro/all-minilm-l6-v2 | 90 MB | Embeddings |
| tenzro/vit-base-patch16 | 350 MB | Image Classification |
Pricing
| Feature | Free | Pro |
|---|---|---|
| Public models | Unlimited | Unlimited |
| Private models | 3 | Unlimited |
| Storage | 10 GB | 1 TB |
| Bandwidth | 100 GB/month | 10 TB/month |