min read
min read
5 min read
AI bot traffic is accelerating across the web. We built bots.fyi to track it in real time, and the data reveals three types of AI-driven crawlers that work in sequence, creating a discovery flywheel many teams are unintentionally disrupting.
Not all bots are harmful. Crawlers power SEO, and we’ve spent years optimizing for them. Blocking AI crawlers without understanding their role is like blocking search engines and then wondering why organic traffic disappears. The real advantage comes from understanding each type of bot and deciding where access creates value.
Think of AI traffic as a pipeline with distinct stages. Your content is not crawled once and forgotten. Each stage builds on the last until it reaches a user. If you block crawlers, your content will not enter the training data. Without training data, it cannot be cited. Without citations, you will not receive referrals.
Did you know bots made up more than 20% of all traffic across Vercel deployments last week? About a quarter of that came from AI crawlers alone. Some of this is malicious and blocked automatically, but much of it drives discovery and growth when handled well.
AI training crawlers such as GPTBot and ClaudeBot visit nearly every public page they can access. That includes documentation, blog posts, product pages, pricing, and changelogs. The goal is to capture a wide and current view of the web so the information can be built into future AI responses. This content becomes part of what these models “know” about your product.
In bots.fyi’s dataset last week, training crawlers made up the largest share of AI bot traffic. They do not just revisit popular pages. They aim to cover the full breadth of a site so they can store as much relevant material as possible.
If you publish detailed deployment docs for your framework, a training crawler will store that information so an AI model can later answer “How do I deploy with X?” using your instructions. Without being indexed, your docs will never make it into those answers.
Grounding crawlers run when a user’s question needs current information that might not be in the training data. When someone asks ChatGPT “What’s new in Next.js 15?”, or Perplexity “Which startups are building in AI infrastructure?”, these systems check both their training data and live websites for updates.
If your content is in the training set, the system can reference it in responses. Without it, the system has nothing to reference, making citations unlikely. Out of sight often means out of mind.
Even a single well-indexed page can generate hundreds or thousands of mentions across different queries.
Example: A blog post announcing a new feature can be fetched by grounding bots within days of publication, letting the AI recommend your product to users searching for related tools much faster.
AI referrals are visitors who click through from AI-generated responses to your site. They often arrive after a highly relevant prompt led to a response that cited your content, so they already know what they are looking for and are ready to act. Many convert at higher rates after seeing a tailored summary or recommendation.
In our network data, AI referrals still trail traditional search referrals in total volume but continue to grow each month. Some sites report higher conversion rates from this group than from organic search visitors.
Example: If an AI platform suggests your product in a list of “the best platforms for serverless deployment” and the user clicks through, they often arrive ready to evaluate or purchase.
Some websites once blocked Google’s crawlers, thinking bots wasted bandwidth. "Why should I let Google crawl my site for free?" Those sites missed the search boom. Today, developers are making the same mistake with AI crawlers.
AI-powered search already handles billions of queries. Users now discover content through AI platforms alongside traditional search. Blocking AI crawlers cuts off a growing discovery channel, and these channels are an integral part of the source material AI systems draw from when generating answers, recommendations, and comparisons.
Compare this to traditional SEO, where you're competing for one of ten blue links. With AI systems, your content can surface across countless user queries.
Different pages serve different purposes, so crawler access should be selective:
Block AI crawlers from sensitive routes like /login
, /checkout
, /admin
, and user dashboards. These pages don't help you in AI training data and probably shouldn't be crawled anyway
Allow crawlers on discovery content: documentation, blog posts, landing pages, product pages, pricing pages. This is where you benefit from being cited or recommended
There are legitimate cases for restricting AI access. If your content is your product (news sites, educational platforms, premium research), then unlimited AI access might cannibalize your business model. AI systems could provide such complete answers that users never need to visit your site.
For most developers building products or services, blocking AI crawlers is like refusing to be listed in Google. In these cases, the better approach is to protect sensitive pages while keeping high-value discovery content accessible. Tools like Vercel Firewall, Bot Protection, and BotID can help verify legitimate crawlers, filter out impersonators, and manage suspicious traffic without shutting down AI-driven discovery.
AI crawlers are not inherently good or bad. For some sites, they drive new traffic, citations, and brand visibility. For others, they can compete directly with the business model by giving away the very information that makes the site valuable. The effect depends on your content and how you monetize it.
The web is shifting to serve both human visitors and AI systems. Sites that adapt see more AI referrals, more citations, and greater authority in their domains.
Tools like bots.fyi can show the broader trends, but the real insight comes from measuring how each type of AI traffic affects your own site. Bots have always been part of the internet. What is changing is their role in content discovery.