AI is becoming a core part of modern web applications. From chatbots and support systems, to image and voice generators, more apps now rely on AI to deliver useful, personalized, or automated experiences. However, with so many new concepts and tools, it's hard to know where to start.
This guide gives you a high-level overview of how to build and deploy AI-powered applications.
While AI features can vary in size and complexity, most follow a similar Input → Reason and act → Output pattern. Let's walk through what each of these steps involves.
The first step is accepting input from a user or system. This could be:
- Freeform text (e.g. "What's the weather in London?")
- Multi-modal content (e.g. images, files, audio, etc)
- An event with structured data (e.g. a new PR, a form submission, etc)
Once the input is received, your app needs to decide how to respond. This is where reasoning comes in.
To reason, your app uses an AI model, often a Large Language Model (LLM).
At a basic level, the model takes a prompt and answers a question, summarizes some text, or generates code. For example, you can use the AI SDK to send a prompt to the OpenAI model and get a response:
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
// Basic text generation
const result = await generateText({
model: openai('gpt-4'),
prompt: 'Where is London?',
});
// Possible output: "London is in the United Kingdom."
But real-world use cases often go beyond simple Q&A. You can guide how the model responds by adding context, call tools that perform actions, and even use agents to make multi-step decisions. Here's how:
By default, models don't remember past interactions or know anything about your user. You can add context to help it personalize its responses and accuracy. Some common ways to add context are:
Memory: Include past messages or stored user preferences.
import { generateText } from 'ai'; import { openai } from '@ai-sdk/openai'; const previousMessages = [ { role: 'user', content: 'Hi, my name is Alice.' }, { role: 'assistant', content: 'Nice to meet you, Alice!' }, ]; const result = await generateText({ model: openai('gpt-4'), messages: previousMessages, prompt: 'What is my name?', }); // Output: "Alice"
Custom instructions: Inject your own rules and instructions into the input.
import { generateText } from 'ai'; import { openai } from '@ai-sdk/openai'; const customInstructions = `You are a customer service bot for Acme Corp. Rules: - Always be polite and professional - If you can't help, escalate to human support`; const userQuestion = 'I need help with my order'; const response = await generateText({ model: openai('gpt-4'), messages: [ { role: 'system', content: customInstructions }, { role: 'user', content: userQuestion }, ], });
Retrieval-Augmented Generation (RAG): Retrieve information from a database, and inject it into the input.
import { embed, generateText } from 'ai'; import { openai } from '@ai-sdk/openai'; import { queryKnowledgeBase } from '@/data'; // 1. User asks about London const userQuestion = 'What are the best things to do in London?'; // 2. Create embedding for the question const questionEmbedding = await embed({ model: openai.embedding('text-embedding-3-small'), value: userQuestion, }); // 3. Find relevant content from your London knowledge base const relevantDocs = await queryKnowledgeBase(questionEmbedding.embedding); const context = relevantDocs.map((doc) => doc.content).join('\n'); // 4. Generate response with retrieved context const response = await generateText({ model: openai('gpt-4'), messages: [ { role: 'user', content: `Context: ${context}\n\nQuestion: ${userQuestion}`, }, ], });
While reasoning, the model can gather more information or perform actions, for example:
- Calling tools: Functions that the model can call to interact with external systems (APIs, databases, etc.)
- Using a Model Context Protocol (MCP): A standard that connects AI models to external systems like file systems, databases, and APIs
These can be combined to create more complex workflows:
import { generateText, tool } from 'ai'; import { openai } from '@ai-sdk/openai'; import { experimental_createMCPClient as createMCPClient } from 'ai'; import { z } from 'zod'; // Create MCP client for external tools const mcpClient = await createMCPClient({ transport: { type: 'sse', url: 'https://travel-api.com/mcp', }, }); const mcpTools = await mcpClient.tools(); const result = await generateText({ model: openai('gpt-4'), prompt: 'Plan a 3-day trip to London for next month', tools: { calculateBudget: tool({ description: 'Calculate total trip budget', inputSchema: z.object({ flights: z.number(), hotels: z.number(), activities: z.number(), }), execute: async ({ flights, hotels, activities }) => { return flights + hotels + activities; }, }), ...mcpTools, }, });
As your app grows, you may want to automate more of the reasoning. That's where agents come in.
Instead of calling the model once, an agent gives it a goal, asks what to do next, takes action, and repeats in a loop until the goal is complete. For example, with the AI SDK, the following code will call the model 5 times:
import { generateText, tool, stepCountIs } from 'ai'; import { openai } from '@ai-sdk/openai'; import { searchFlights, findHotels } from '@/lib/tools'; const result = await generateText({ model: openai('gpt-4'), stopWhen: stepCountIs(5), system: 'You are a travel planning agent. ' + 'Break down the trip planning into steps. ' + 'Use available tools to gather information and make decisions.', prompt: 'Plan a 3-day trip to London for next month', tools: { searchFlights, findHotels }, });
After reasoning, you can return the result to the user. Common output formats include:
- Plain text: e.g. "Cod is most commonly used in Fish and Chips."
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateText({
model: openai('gpt-4'),
prompt: 'What types of fish are used in Fish and Chips?',
});
- Generated assets: e.g. images, audio, or files created by the model
import { generateImage } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateImage({
model: openai.image('dall-e-3'),
prompt: 'A sunset over the London skyline',
});
console.log(result.image); // Base64 image data or URL
- Functional code: e.g. from simple logic to full-stack apps
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateText({
model: openai('gpt-4'),
prompt: 'Write a JavaScript function that calculates the area of a circle',
});
console.log(result.text);
// function calculateCircleArea(radius) {
// return Math.PI * radius * radius;
// }
At this stage, you need to consider how to handle the output. For example, you can:
- Safely execute AI-generated code
- Stream long-running responses
- Use evals to test the quality of the output
AI-generated code can be unpredictable. Vercel Sandbox provides isolated environments to safely run code.
It can take time for a model to return a response, especially when you call tools and introduce multi-step workflows. You can use streaming to break up the response into chunks and return something to the user sooner, keeping your app responsive:
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
const { textStream } = streamText({
model: openai('gpt-4'),
prompt: 'When is the best time to visit London?',
});
for await (const textPart of textStream) {
console.log(textPart);
}
Since AI responses are free-flowing, it can be hard to test if the output is as expected.
Evals are automated tests that check if a model is producing accurate outputs. You can run evals for:
- Single prompts: Did the model respond correctly?
- Full user journeys: Did the whole conversation flow work?
- Agents: Did each step make sense and complete the task?
- Performance: Response time, cost, and accuracy metrics
AI applications require infrastructure that can handle variable workloads, long-running tasks, and integrations with multiple services. Vercel's AI Cloud provides a set of tools to help you scale, secure, and monitor your AI applications. Such as:
- Fluid Compute: Serverless compute for AI workloads with optimized concurrency and active CPU pricing.
- Queues: Background job processing for long-running processes and multi-step reasoning.
- AI Gateway: A single interface to 100+ AI models without managing individual API keys or rate limits.
- Sandbox: Secure isolated environments for executing AI-generated code safely.
- Firewall: Protects against DDoS attacks and unauthorized usage of AI endpoints.
- BotID: Bot detection service that identifies and blocks automated traffic.
- Observability: Real-time monitoring and analytics for performance, usage, and errors.