AI Inference

Access leading AI models through a unified API. Generate text, embeddings, and more with Google Gemini, OpenAI, Anthropic Claude, and open-source models.

Supported Models

Provider	Models	Capabilities
Google	gemini-2.5-flash, gemini-2.5-pro, gemini-3-pro-preview	Text, Vision, Function Calling
OpenAI	gpt-4o	Text, Vision, Function Calling
Anthropic	claude-sonnet-4	Text, Vision, Function Calling

Quick Start

import { Tenzro } from '@tenzro/cloud';

const client = new Tenzro({ apiKey: 'your-api-key' });

// Simple chat - automatically uses gemini-2.5-flash
const response = await client.ai.chat('Explain quantum computing in simple terms');

console.log(response.text);

Chat with Messages

// Chat with full message history
const response = await client.ai.chat({
  messages: [
    { role: 'system', content: 'You are a helpful coding assistant.' },
    { role: 'user', content: 'Write a TypeScript function to validate email addresses' },
  ],
  model: 'gemini-2.5-pro',
  temperature: 0.7,
  maxTokens: 1000,
});

console.log(response.text);
console.log('Usage:', response.usage);

Streaming Responses

// Stream responses for real-time output
const stream = await client.ai.chatStream('Write a short story about AI');

for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}

Direct Inference

// Direct model inference with full control
const response = await client.ai.infer({
  model: 'gemini-2.5-pro',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' },
  ],
  temperature: 0.7,
  maxTokens: 500,
});

console.log(response.text);

Generate Embeddings

// Generate embeddings for semantic search
const embedding = await client.ai.embed({
  text: 'Machine learning is fascinating',
  model: 'gemini-2.5-flash',
  taskType: 'RETRIEVAL_DOCUMENT',
  dimensionality: 768,
});

console.log(embedding.embedding); // [0.123, -0.456, ...]

// For queries (semantic search)
const queryEmbedding = await client.ai.embed({
  text: 'What is machine learning?',
  taskType: 'RETRIEVAL_QUERY',
});

AI Endpoints

// Create a custom AI endpoint
const endpoint = await client.ai.createEndpoint({
  projectId: 'project-id',
  endpointName: 'my-chatbot',
  model: 'gemini-2.5-flash',
  systemPrompt: 'You are a helpful customer service assistant.',
  temperature: 0.7,
});

console.log('Endpoint:', endpoint.endpoint_url);

// Use the endpoint for inference
const response = await client.ai.inferWithEndpoint({
  endpointId: endpoint.endpoint_id,
  message: 'I need help with my order',
});

console.log(response.text);

Configuration Options

Parameter	Type	Description
`model`	string	Model identifier
`temperature`	number	Randomness (0-2, default 1)
`maxTokens`	number	Maximum output tokens
`topP`	number	Nucleus sampling (0-1)
`frequencyPenalty`	number	Reduce repetition (-2 to 2)
`presencePenalty`	number	Encourage new topics (-2 to 2)
`stop`	string[]	Stop sequences

Error Handling

try {
  const response = await client.ai.chat('Hello');
  console.log(response.text);
} catch (error) {
  console.error('AI inference error:', error.message);
  // Handle specific error types as needed
}

Model Selection

Choose the right model for your use case:

Model	Provider	Best For
gemini-2.5-flash	Google	Fast responses, high throughput, cost-effective
gemini-2.5-pro	Google	Complex reasoning, better accuracy
gemini-3-pro-preview	Google	Latest capabilities, experimental features
gpt-4o	OpenAI	Multimodal tasks, vision + text
claude-sonnet-4	Anthropic	Long context, analysis, coding

AI Agents - Orchestrate multi-step AI workflows
MCP Servers - Deploy tool-enabled AI services
Workflows - Visual AI workflow builder

AI Inference

Supported Models

Quick Start

Chat with Messages

Streaming Responses

Direct Inference

Generate Embeddings

AI Endpoints

Configuration Options

Error Handling

Model Selection

Related