AI Inference

Access leading AI models through a unified API. Generate text, embeddings, and more with Google Gemini, OpenAI, Anthropic Claude, and open-source models.

Supported Models

ProviderModelsCapabilities
Googlegemini-2.5-flash, gemini-2.5-pro, gemini-3-pro-previewText, Vision, Function Calling
OpenAIgpt-4oText, Vision, Function Calling
Anthropicclaude-sonnet-4Text, Vision, Function Calling

Quick Start

import { Tenzro } from '@tenzro/cloud';
const client = new Tenzro({ apiKey: 'your-api-key' });
// Simple chat - automatically uses gemini-2.5-flash
const response = await client.ai.chat('Explain quantum computing in simple terms');
console.log(response.text);

Chat with Messages

// Chat with full message history
const response = await client.ai.chat({
messages: [
{ role: 'system', content: 'You are a helpful coding assistant.' },
{ role: 'user', content: 'Write a TypeScript function to validate email addresses' },
],
model: 'gemini-2.5-pro',
temperature: 0.7,
maxTokens: 1000,
});
console.log(response.text);
console.log('Usage:', response.usage);

Streaming Responses

// Stream responses for real-time output
const stream = await client.ai.chatStream('Write a short story about AI');
for await (const chunk of stream) {
process.stdout.write(chunk.text);
}

Direct Inference

// Direct model inference with full control
const response = await client.ai.infer({
model: 'gemini-2.5-pro',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' },
],
temperature: 0.7,
maxTokens: 500,
});
console.log(response.text);

Generate Embeddings

// Generate embeddings for semantic search
const embedding = await client.ai.embed({
text: 'Machine learning is fascinating',
model: 'gemini-2.5-flash',
taskType: 'RETRIEVAL_DOCUMENT',
dimensionality: 768,
});
console.log(embedding.embedding); // [0.123, -0.456, ...]
// For queries (semantic search)
const queryEmbedding = await client.ai.embed({
text: 'What is machine learning?',
taskType: 'RETRIEVAL_QUERY',
});

AI Endpoints

// Create a custom AI endpoint
const endpoint = await client.ai.createEndpoint({
projectId: 'project-id',
endpointName: 'my-chatbot',
model: 'gemini-2.5-flash',
systemPrompt: 'You are a helpful customer service assistant.',
temperature: 0.7,
});
console.log('Endpoint:', endpoint.endpoint_url);
// Use the endpoint for inference
const response = await client.ai.inferWithEndpoint({
endpointId: endpoint.endpoint_id,
message: 'I need help with my order',
});
console.log(response.text);

Configuration Options

ParameterTypeDescription
modelstringModel identifier
temperaturenumberRandomness (0-2, default 1)
maxTokensnumberMaximum output tokens
topPnumberNucleus sampling (0-1)
frequencyPenaltynumberReduce repetition (-2 to 2)
presencePenaltynumberEncourage new topics (-2 to 2)
stopstring[]Stop sequences

Error Handling

try {
const response = await client.ai.chat('Hello');
console.log(response.text);
} catch (error) {
console.error('AI inference error:', error.message);
// Handle specific error types as needed
}

Model Selection

Choose the right model for your use case:

ModelProviderBest For
gemini-2.5-flashGoogleFast responses, high throughput, cost-effective
gemini-2.5-proGoogleComplex reasoning, better accuracy
gemini-3-pro-previewGoogleLatest capabilities, experimental features
gpt-4oOpenAIMultimodal tasks, vision + text
claude-sonnet-4AnthropicLong context, analysis, coding

Related