How to build AI apps: an overview

AI is becoming a core part of modern web applications. From chatbots and support systems, to image and voice generators, more apps now rely on AI to deliver useful, personalized, or automated experiences. However, with so many new concepts and tools, it's hard to know where to start.

This guide gives you a high-level overview of how to build and deploy AI-powered applications.

Build

While AI features can vary in size and complexity, most follow a similar Input → Reason and act → Output pattern. Let's walk through what each of these steps involves.

Input

The first step is accepting input from a user or system. This could be:

Freeform text (e.g. "What's the weather in London?")
Multi-modal content (e.g. images, files, audio, etc)
An event with structured data (e.g. a new PR, a form submission, etc)

Once the input is received, your app needs to decide how to respond. This is where reasoning comes in.

Reason and act

To reason, your app uses an AI model, often a Large Language Model (LLM).

What is a Large Language Model (LLM)?

At a basic level, the model takes a prompt and answers a question, summarizes some text, or generates code. For example, you can use the AI SDK to send a prompt to the OpenAI model and get a response:

app/ai-sdk-example.ts

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
 
// Basic text generation
const result = await generateText({
  model: openai('gpt-4'),
  prompt: 'Where is London?',
});
 
// Possible output: "London is in the United Kingdom."

But real-world use cases often go beyond simple Q&A. You can guide how the model responds by adding context, call tools that perform actions, and even use agents to make multi-step decisions. Here's how:

Add context

By default, models don't remember past interactions or know anything about your user. You can add context to help it personalize its responses and accuracy. Some common ways to add context are:

Memory: Include past messages or stored user preferences.

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
 
const previousMessages = [
  { role: 'user', content: 'Hi, my name is Alice.' },
  { role: 'assistant', content: 'Nice to meet you, Alice!' },
];
 
const result = await generateText({
  model: openai('gpt-4'),
  messages: previousMessages,
  prompt: 'What is my name?',
});
 
// Output: "Alice"

Custom instructions: Inject your own rules and instructions into the input.

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
 
const customInstructions = `You are a customer service bot for Acme Corp.
Rules:
- Always be polite and professional
- If you can't help, escalate to human support`;
 
const userQuestion = 'I need help with my order';
 
const response = await generateText({
  model: openai('gpt-4'),
  messages: [
    { role: 'system', content: customInstructions },
    { role: 'user', content: userQuestion },
  ],
});

Retrieval-Augmented Generation (RAG): Retrieve information from a database, and inject it into the input.

import { embed, generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { queryKnowledgeBase } from '@/data';
 
// 1. User asks about London
const userQuestion = 'What are the best things to do in London?';
 
// 2. Create embedding for the question
const questionEmbedding = await embed({
  model: openai.embedding('text-embedding-3-small'),
  value: userQuestion,
});
 
// 3. Find relevant content from your London knowledge base
const relevantDocs = await queryKnowledgeBase(questionEmbedding.embedding);
const context = relevantDocs.map((doc) => doc.content).join('\n');
 
// 4. Generate response with retrieved context
const response = await generateText({
  model: openai('gpt-4'),
  messages: [
    {
      role: 'user',
      content: `Context: ${context}\n\nQuestion: ${userQuestion}`,
    },
  ],
});

Take action

While reasoning, the model can gather more information or perform actions, for example:

Calling tools: Functions that the model can call to interact with external systems (APIs, databases, etc.)
Using a Model Context Protocol (MCP): A standard that connects AI models to external systems like file systems, databases, and APIs

These can be combined to create more complex workflows:

import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { experimental_createMCPClient as createMCPClient } from 'ai';
import { z } from 'zod';
 
// Create MCP client for external tools
const mcpClient = await createMCPClient({
  transport: {
    type: 'sse',
    url: 'https://travel-api.com/mcp',
  },
});
 
const mcpTools = await mcpClient.tools();
 
const result = await generateText({
  model: openai('gpt-4'),
  prompt: 'Plan a 3-day trip to London for next month',
  tools: {
    calculateBudget: tool({
      description: 'Calculate total trip budget',
      inputSchema: z.object({
        flights: z.number(),
        hotels: z.number(),
        activities: z.number(),
      }),
      execute: async ({ flights, hotels, activities }) => {
        return flights + hotels + activities;
      },
    }),
    ...mcpTools,
  },
});

Automate with agents

As your app grows, you may want to automate more of the reasoning. That's where agents come in.

Instead of calling the model once, an agent gives it a goal, asks what to do next, takes action, and repeats in a loop until the goal is complete. For example, with the AI SDK, the following code will call the model 5 times:

import { generateText, tool, stepCountIs } from 'ai';
import { openai } from '@ai-sdk/openai';
import { searchFlights, findHotels } from '@/lib/tools';
 
const result = await generateText({
  model: openai('gpt-4'),
  stopWhen: stepCountIs(5),
  system:
    'You are a travel planning agent. ' +
    'Break down the trip planning into steps. ' +
    'Use available tools to gather information and make decisions.',
  prompt: 'Plan a 3-day trip to London for next month',
  tools: { searchFlights, findHotels },
});

Output

After reasoning, you can return the result to the user. Common output formats include:

Plain text: e.g. "Cod is most commonly used in Fish and Chips."

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
 
const result = await generateText({
  model: openai('gpt-4'),
  prompt: 'What types of fish are used in Fish and Chips?',
});

Generated assets: e.g. images, audio, or files created by the model

import { generateImage } from 'ai';
import { openai } from '@ai-sdk/openai';
 
const result = await generateImage({
  model: openai.image('dall-e-3'),
  prompt: 'A sunset over the London skyline',
});
 
console.log(result.image); // Base64 image data or URL

Functional code: e.g. from simple logic to full-stack apps

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
 
const result = await generateText({
  model: openai('gpt-4'),
  prompt: 'Write a JavaScript function that calculates the area of a circle',
});
 
console.log(result.text);
// function calculateCircleArea(radius) {
//   return Math.PI * radius * radius;
// }

At this stage, you need to consider how to handle the output. For example, you can:

Safely execute AI-generated code
Stream long-running responses
Use evals to test the quality of the output

Safely executing code

AI-generated code can be unpredictable. Vercel Sandbox provides isolated environments to safely run code.

Streaming for better UX

It can take time for a model to return a response, especially when you call tools and introduce multi-step workflows. You can use streaming to break up the response into chunks and return something to the user sooner, keeping your app responsive:

import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
 
const { textStream } = streamText({
  model: openai('gpt-4'),
  prompt: 'When is the best time to visit London?',
});
 
for await (const textPart of textStream) {
  console.log(textPart);
}

Evaluating responses

Since AI responses are free-flowing, it can be hard to test if the output is as expected.

Evals are automated tests that check if a model is producing accurate outputs. You can run evals for:

Single prompts: Did the model respond correctly?
Full user journeys: Did the whole conversation flow work?
Agents: Did each step make sense and complete the task?
Performance: Response time, cost, and accuracy metrics

Deploy

AI applications require infrastructure that can handle variable workloads, long-running tasks, and integrations with multiple services. Vercel's AI Cloud provides a set of tools to help you scale, secure, and monitor your AI applications. Such as:

Scale

Fluid Compute: Serverless compute for AI workloads with optimized concurrency and active CPU pricing.
Queues: Background job processing for long-running processes and multi-step reasoning.
AI Gateway: A single interface to 100+ AI models without managing individual API keys or rate limits.

Secure

Sandbox: Secure isolated environments for executing AI-generated code safely.
Firewall: Protects against DDoS attacks and unauthorized usage of AI endpoints.
BotID: Bot detection service that identifies and blocks automated traffic.

Monitor

Observability: Real-time monitoring and analytics for performance, usage, and errors.

AI Cloud

Core Platform

Security

Company

Open Source

Tools

Use Cases

Users