AI Inference

Access leading AI models through a unified API. Generate text, embeddings, and more with Google Gemini, OpenAI, Anthropic Claude, and open-source models.

Supported Models

Provider	Models	Capabilities
Google	gemini-2.5-flash, gemini-3-pro-preview, gemini-embedding-001	Text, Vision, Embeddings
OpenAI	gpt-5, gpt-5.1, gpt-5-mini, gpt-5.1-codex	Text, Vision, Code
Anthropic	claude-sonnet-4-5, claude-opus-4-1, claude-haiku-4-5	Text, Vision, Code

Quick Start

import { Tenzro } from 'tenzro';

const tenzro = new Tenzro();

// Text generation
const response = await tenzro.ai.generate({
  model: 'gemini-2.5-flash',
  prompt: 'Explain quantum computing in simple terms',
  maxTokens: 500,
});

console.log(response.text);

Chat Completions

const response = await tenzro.ai.chat({
  model: 'gemini-3-pro-preview',
  messages: [
    { role: 'system', content: 'You are a helpful coding assistant.' },
    { role: 'user', content: 'Write a TypeScript function to validate email addresses' },
  ],
  temperature: 0.7,
  maxTokens: 1000,
});

console.log(response.message.content);

Streaming Responses

const stream = await tenzro.ai.stream({
  model: 'gemini-2.5-flash',
  messages: [
    { role: 'user', content: 'Write a short story about AI' },
  ],
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}

// Get final response with usage stats
const finalResponse = await stream.finalResponse();
console.log('Tokens used:', finalResponse.usage);

Function Calling

const response = await tenzro.ai.chat({
  model: 'gemini-3-pro-preview',
  messages: [
    { role: 'user', content: 'What is the weather in San Francisco?' },
  ],
  tools: [
    {
      name: 'get_weather',
      description: 'Get current weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string', description: 'City name' },
          unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
        },
        required: ['location'],
      },
    },
  ],
  toolChoice: 'auto',
});

if (response.toolCalls) {
  for (const call of response.toolCalls) {
    console.log('Function:', call.name);
    console.log('Arguments:', call.arguments);
  }
}

Generate Embeddings

// Single embedding
const embedding = await tenzro.ai.embed({
  model: 'text-embedding-3-small',
  input: 'Machine learning is fascinating',
});

console.log(embedding.vector); // [0.123, -0.456, ...]
console.log(embedding.dimensions); // 1536

// Batch embeddings
const embeddings = await tenzro.ai.embedBatch({
  model: 'text-embedding-3-small',
  inputs: [
    'First document',
    'Second document',
    'Third document',
  ],
});

console.log(embeddings.vectors.length); // 3

Vision (Multimodal)

// Analyze an image
const response = await tenzro.ai.chat({
  model: 'gemini-3-pro-preview',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What do you see in this image?' },
        { type: 'image_url', image_url: { url: 'https://example.com/image.jpg' } },
      ],
    },
  ],
});

// Or with base64
const response = await tenzro.ai.chat({
  model: 'gpt-5',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'Describe this diagram' },
        { type: 'image_url', image_url: { url: `data:image/png;base64,${base64Image}` } },
      ],
    },
  ],
});

Configuration Options

Parameter	Type	Description
`model`	string	Model identifier
`temperature`	number	Randomness (0-2, default 1)
`maxTokens`	number	Maximum output tokens
`topP`	number	Nucleus sampling (0-1)
`frequencyPenalty`	number	Reduce repetition (-2 to 2)
`presencePenalty`	number	Encourage new topics (-2 to 2)
`stop`	string[]	Stop sequences

Error Handling

import { TenzroError, RateLimitError, ModelError } from 'tenzro';

try {
  const response = await tenzro.ai.generate({
    model: 'gemini-2.5-flash',
    prompt: 'Hello',
  });
} catch (error) {
  if (error instanceof RateLimitError) {
    console.log('Rate limited, retry after:', error.retryAfter);
  } else if (error instanceof ModelError) {
    console.log('Model error:', error.message);
  } else {
    throw error;
  }
}

Usage & Pricing

AI inference is billed per token:

Model	Input (1M tokens)	Output (1M tokens)
gemini-2.5-flash	$0.075	$0.30
gemini-3-pro-preview	$1.25	$5.00
gpt-5-mini	$0.15	$0.60
gpt-5	$2.50	$10.00
claude-sonnet-4-5	$3.00	$15.00

AI Agents - Orchestrate multi-step AI workflows
MCP Servers - Deploy tool-enabled AI services
Workflows - Visual AI workflow builder

AI Inference

Supported Models

Quick Start

Chat Completions

Streaming Responses

Function Calling

Generate Embeddings

Vision (Multimodal)

Configuration Options

Error Handling

Usage & Pricing

Related