Bot Consent Protocol

9. januarja, 20261. februarja, 2026 T. Hac bcp, bot, Bot Consent Protocol, copyrighted content, crawling, gcpr, indexing, Innovation, protocol, standard

Innovation in the field of indexing and crawling of copyrighted content.

Bot Consent Protocol (BCP): A Standard for Regulating Automated Access to Authored Digital Content

Whitepaper on the Legal, Technical, and Economic Framework for Managing Bots, Crawlers, and Automated Traffic

Version: 1.0 (Draft) | Status: Proposed Standard | Acronym: BCP | Date: January 8th, 2026

1. Executive Summary

The Bot Consent Protocol (BCP) is a proposed technical and legal standard for regulating automated access to websites.
Its purpose is to establish a transparent, fair, and measurable relationship between website owners and automated systems accessing their infrastructure—including search engine crawlers, analytics tools, scrapers, AI model crawlers, and other forms of non-human traffic.

In the current web ecosystem, automated traffic accounts for a significant share—often the majority—of total traffic, yet most of this traffic is unregulated, unaccountable, and carries no responsibility for the load it imposes.
The traditional mechanism, robots.txt, introduced in 1994, has become ineffective: it is not legally binding, not technically enforceable, and does not protect against excessive or abusive crawling.

BCP introduces a new approach: automated agents must accept clearly defined terms of use before continuing access, similar to how human users must accept cookies, GDPR notices, or terms of service.
Acceptance is automated and based on continued requests after the display of a Bot Consent Page (BCP Page).
This creates a legal and technical audit trail that enables documentation, regulation, and, where appropriate, billing of the costs caused by automated traffic.

The protocol also defines an economic model that sets a price for automated access (e.g., €0.01 per request), as well as mechanisms for identifying, classifying, and sanctioning bots that violate the rules.
BCP is designed to be technically feasible in existing server environments (nginx, Apache, Cloudflare, Node.js, PHP, Python) and legally compatible with existing regulations, including GDPR, ePrivacy, and the Digital Services Act.

The goal of BCP is not to restrict access, but to establish a fair, transparent, and accountable system in which automated visitors are subject to the same fundamental principles as human users: information, consent, and responsibility.

2. Introduction

Over the past two decades, the web has evolved from an environment dominated by human visitors into one where most traffic is generated by automated systems.
Search engine crawlers, analytics bots, SEO tools, scraping systems, and various automated agents now form the backbone of the digital economy.
However, this backbone operates without clear rules, without accountability, and without economic balance.

Website owners—especially publishers, bloggers, media organizations, and independent creators—bear the full cost of the infrastructure that bots consume.
Excessive crawling, ignoring robots.txt, forged user agents, and aggressive scrapers generate costs that are not compensated by anyone.
At the same time, publishers have no effective mechanism to regulate or limit automated access, beyond the outdated robots.txt, which is merely a non-binding recommendation with no legal or technical weight.

In this context, it becomes clear that the web needs a new standard—one that reflects the reality of the modern internet, where automated systems are the norm rather than the exception.
The Bot Consent Protocol (BCP) is a proposal for such a standard: a system that imposes on automated visitors the same fundamental obligations as on human users—being informed, providing consent, and bearing responsibility.

BCP introduces the concept of a Bot Consent Page, where automated systems are informed about terms of use, pricing, and access rules.
Continuing access after the BCP Page is shown constitutes acceptance of these terms, creating a legal and technical trail that enables regulation and cost accounting.
The protocol also includes technical guidelines for identifying bots, classifying traffic, limiting abuse, and implementing sanctions.

This document presents the overall concept, legal justification, technical specification, and economic model of BCP and serves as a basis for discussion, development, and potential standardization.

3. Definitions

For the purposes of the Bot Consent Protocol (BCP), the following definitions apply:

3.1 Bot

An automated system, script, program, or agent that accesses a website without direct human interaction.
This includes search engine crawlers, analytics bots, scraping tools, AI model crawlers, testing agents, and all other forms of automated traffic.

3.2 Crawler

A bot that systematically scans web pages for the purpose of indexing, analysis, or data collection.
Examples include Googlebot, Bingbot, Amazonbot, and other search or analytics crawlers.

3.3 Scraper

A bot that retrieves content for the purpose of reuse, redistribution, model training, or commercial exploitation.
Scrapers often ignore robots.txt, use forged user agents, or generate excessive load.

3.4 Automated Access

Any HTTP request not initiated by a direct human user.
Automated access includes crawling, scraping, API calls, testing requests, and all other forms of non-interactive traffic.

3.5 Excessive Crawling

Any automated access that exceeds normal or expected visit patterns, such as:
high request frequency, repeated access to the same URLs, heavy load in short intervals, or ignoring access rules.

3.6 Unauthorized Access

Any automated access that:
ignores the Bot Consent Page (BCP Page), does not accept the terms of use, uses forged identifiers, or violates defined limits or pricing.

3.7 Bot Consent Page (BCP Page)

A dedicated document presented by the server to automated visitors, containing:
terms of use, pricing, access rules, legal notice, and the definition of implicit consent.
Continuing access after the BCP Page is displayed constitutes acceptance of the terms.

3.8 Implicit Acceptance of Terms

A legal and technical doctrine under which continued automated access after the BCP Page has been presented is deemed acceptance of the terms of use and pricing.

3.9 Shadow Realm

A technical mechanism that isolates or redirects traffic from bots that violate rules into an environment with limited resources, minimal or synthetic content, or degraded responsiveness, without affecting legitimate human users.

3.10 AI Model Bot

Any automated system that accesses content for the purpose of training, improving, or commercially exploiting artificial intelligence models.
This includes large language model (LLM) crawlers, data aggregators, AI training crawlers, and other systems that collect content for commercial AI use.

4. Problem: Why the Current System Fails

4.1 `robots.txt` is outdated and ineffective

The Robots Exclusion Protocol (REP), introduced in 1994, was designed for a web that was smaller, slower, and predominantly human.
Today, robots.txt is merely a non-binding recommendation that:
has no legal force, is not technically enforceable, is systematically ignored by scrapers, is followed selectively by legitimate bots, does not enable sanctions, does not support proof of violations, and does not provide any economic accounting.
REP has become an alibi, not a protection mechanism.

4.2 The cost of automated traffic is borne solely by publishers

Automated traffic generates direct and measurable costs:
bandwidth, CPU and RAM usage, I/O operations, logging, cache invalidation, increased latency, reduced availability, higher hosting costs, and increased need for security mechanisms.
These costs are one-sided: publishers pay them, while bots generate them without any responsibility or compensation.

4.3 Excessive crawling is a systemic problem

In practice, many publishers report:
10,000–20,000 requests per day from a single bot, aggressive SEO tools (Ahrefs, Semrush, Moz), Amazonbot scanning entire sites without clear justification, forged Googlebots, botnets masquerading as legitimate crawlers, and scrapers copying full articles in real time.
Such traffic is unnecessary, disproportionate, opaque, and unregulated.
Publishers lack mechanisms to limit or charge for it.

4.4 Commercialization of crawled content in AI models

The most significant shift in recent years is the explosion of generative AI.
Large language models (LLMs), AI assistants, and commercial AI platforms rely on massive crawling of content created by publishers.
This content is then used to train models, embedded into AI-generated answers, monetized via subscriptions, API access, and enterprise licenses, and used to replace original publishers in search results—reducing traffic to the source sites.

Meanwhile, publishers:
receive no compensation, are not informed, have no control, bear the infrastructure costs, lose revenue, and lose visibility.
This creates a one-sided economy in which publishers finance the infrastructure, while the AI industry commercializes the results.
This is structurally unsustainable.

4.5 Lack of a legal framework

Currently:
there is no standard for regulating bots, no rule on consent, no rule on notification, no rule on pricing, no rule on responsibility, and no mechanism for proving violations.
Publishers operate in a legal vacuum where they lack tools, protection, an economic model, and a standard they can invoke.

4.6 Problem conclusion

The web has become an environment where:
humans must accept terms, bots do not; humans are regulated, bots are not; humans bear responsibility, bots do not.
This is a structural asymmetry that requires a new standard.

5. Proposed Solution: Bot Consent Protocol (BCP)

5.1 Concept

The Bot Consent Protocol (BCP) is a proposed standard that requires automated systems to accept clearly defined terms of use before continuing access to a website.
BCP introduces the Bot Consent Page (BCP Page)—a document presented to automated visitors that outlines:
terms of use, access rules, limitations, pricing, legal notice, and the definition of implicit consent.
Continuing automated access after the BCP Page is displayed constitutes acceptance of these terms.

BCP thus establishes the first formal mechanism that treats automated systems similarly to human users:
informed → consent → responsibility.

5.2 Objectives of the protocol

5.2.1 Legal notification

Ensure that automated systems are informed of the terms of use and the economic consequences of access.

5.2.2 Access regulation

Enable website owners to define rules, limits, and conditions for automated traffic.

5.2.3 Traffic economics

Establish a pricing model that compensates for the costs caused by bots.

5.2.4 Transparency

Provide visibility into which bots access the site, how often, and for what purpose.

5.2.5 Evidentiary value

Create a legal and technical trail that enables proof of violations and enforcement of responsibility.

5.2.6 Standardization

Create a unified, repeatable, and extensible framework that can be adopted by publishers, platforms, and technology providers.

5.3 Why BCP is superior to `robots.txt`

robots.txt is:
a recommendation, non-binding, unverifiable, without sanctions, without an economic model, and without legal weight.
BCP is:
a legal document, technically verifiable, binding through implicit consent, extensible, measurable, economically grounded, and evidentiary.
BCP does not replace robots.txt—BCP supersedes it in scope and function.

5.4 Structure of the Bot Consent Page (BCP Page)

5.4.1 Document identification

Protocol name, version, date, and URL.

5.4.2 Terms of use

Clearly defined rules applicable to automated access.

5.4.3 Pricing

A standardized cost structure, for example:
€0.01 per request, €0.05 for excessive crawling, €0.10 for scraping, €250 for ignoring the BCP Page.

5.4.4 Legal basis

Notice of implicit acceptance of terms.

5.4.5 Technical limitations

Request rate limits, allowed endpoints, prohibited patterns.

5.4.6 Contact information

Legal and/or technical contact details.

5.5 Implicit consent mechanism

BCP relies on the established legal doctrine of implicit consent, which states:
“If a user continues to use a service after being notified of the terms, they are deemed to have accepted those terms.”
This doctrine is standard in:
API usage, software licensing, online services, digital contracts, EULAs, cookie banners, and GDPR notices.
BCP formally extends this doctrine to automated access.

Continuing requests after the BCP Page has been displayed constitutes:
acceptance of the terms, acceptance of pricing, acceptance of limitations, and acceptance of responsibility.

5.6 Technical feasibility

BCP is designed to be implementable in existing environments:
nginx (rewrite, map, rate limiting), Apache (mod_rewrite, mod_security), Cloudflare (Workers, Rules), Node.js (middleware), PHP/Python (pre-request logic).
BCP does not require changes on the bot side—it only requires that bots respect the rules, just as humans must respect GDPR and cookie policies.

5.7 Sanctions for violations

BCP defines the following sanctions:
throttling (rate limiting), blocking (HTTP 403 or 429), shadow realm (isolated traffic), cost recovery (billing), and legal notices.
Sanctions are proportional and documented.

5.8 Compatibility with existing systems

BCP is compatible with:
robots.txt, sitemap.xml, API rate limiting, CDN rules, security systems, and optionally AAADCS.
BCP does not interfere with existing standards—it adds a missing layer of regulation.

6. Economics: Pricing for Automated Access

6.1 Why pricing is necessary

Automated traffic generates measurable costs that are fully borne by website owners.
At the same time, automated systems—including search engines, SEO tools, scraping platforms, and AI models—often commercialize the content they obtain through crawling.
This creates a one-sided economy in which publishers finance the infrastructure, while bots exploit it without compensation.
BCP introduces the first standardized economic model that enables fair cost sharing.

6.2 Principles of the BCP economic model

6.2.1 Proportionality

Costs are proportional to the load generated by the bot.

6.2.2 Transparency

The pricing model is public, clear, and accessible via the BCP Page.

6.2.3 Predictability

Bots can estimate the cost of their activity in advance.

6.2.4 Fairness

Publishers receive compensation for the infrastructure they provide.

6.2.5 Evidentiary support

All requests are logged and can be used as evidence.

6.3 Proposed pricing model

Type of Access	Description	Price
Standard request	Single automated HTTP request	€0.01
Excessive crawling	Requests above an allowed threshold (e.g., > 1 req/s)	€0.05
Scraping request	Requests retrieving content for redistribution or AI training	€0.10
Ignoring BCP Page	Continuing access without accepting terms	€250 flat
Forged user agent	Falsely presenting as a legitimate bot	€500 flat
AI model access	Access for training or commercial use of AI models	€0.15 per request

These prices are proposed as an industry baseline that each publisher may adjust based on:
infrastructure size, traffic volume, commercial value of content, and risk profile.

6.4 How costs are calculated

6.4.1 Logging

Each request is logged with:
IP address, user agent, timestamp, URL, status code, and classification (bot / scraper / AI model).

6.4.2 Identification

AAADCS or other systems may assist in traffic classification, but are not mandatory.

6.4.3 Calculation

Costs are calculated based on the number of requests, type of requests, violations, and excessive traffic.

6.4.4 Notification

The publisher may issue an invoice, send a notice, demand cessation of access, or initiate legal action.

6.5 Why the economic model is critical for the future of the web

Without an economic model:
publishers lose revenue, AI models grow on third-party infrastructure, the scraping industry operates without constraints, search engines lack incentives to optimize crawling, and the web becomes unsustainable for smaller creators.
BCP introduces the first mechanism that:
restores balance, enables fair compensation, encourages responsible bot behavior, protects infrastructure, and supports long-term sustainability.

6.6 Economic conclusion

BCP does not introduce “penalties”; it introduces a fair economy.
Bots that generate costs must also cover them—just as in the API industry, where paid access is standard.
BCP thus lays the foundation for a new, sustainable, and fair web, where automated systems are no longer unpaid consumers of infrastructure.

7. Legal Basis

7.1 Ownership of infrastructure

A website, its server, domain, and associated infrastructure are the private property of their operator.
The owner defines:
access rules, terms of use, limitations, pricing, and may refuse or restrict access.
Automated access is not a human right; it is a conditional privilege granted by the infrastructure owner.
BCP formalizes these principles and applies them explicitly to automated systems.

7.2 Implicit acceptance of terms

Continuing requests after the BCP Page has been displayed constitutes:
acceptance of the terms, acceptance of pricing, acceptance of limitations, and acceptance of responsibility.

7.3 Notice and evidence

The BCP Page serves as:
a legal notice, proof of notification, proof of acceptance, and proof of violation.
Because the BCP Page is:
public, accessible, archived, logged, and timestamped,
it is possible to prove that:
the bot was notified, continued access, accepted the terms, violated the rules, and caused costs.
This enables:
invoicing, claims for reimbursement, legal proceedings, and access termination.

7.4 International compatibility

7.4.1 GDPR

BCP does not process personal data of human users.
It processes only technical data of automated systems (IP, user agent, requests), which does not constitute personal data in this context, does not profile individuals, and does not involve sensitive data.

7.4.2 ePrivacy

BCP does not use cookies and does not interfere with the confidentiality of communications.
It operates at the server and HTTP protocol level.

7.4.3 Digital Services Act (DSA)

BCP supports the principles of transparency and accountability required by the DSA by:
disclosing terms of use, disclosing access rules, and enabling proof of abuse.

7.4.4 Electronic communications law

BCP does not interfere with human communication rights.
It regulates only automated agents, which are not legal persons.

7.5 Legal nature of bots

Bots are not natural persons, legal entities, or rights-bearing subjects.
They are automated agents for which:
their organization, owner, or operator is responsible.
BCP formalizes this responsibility:
“For every automated access, the entity that operates or created the bot is responsible.”
This means:
Google is responsible for Googlebot, AI companies for their crawlers, scraping companies for their agents, and AI model operators for their training crawlers.

7.6 Legal enforcement

BCP enables:
invoicing, claims for cost recovery, civil actions, access termination, proof of abuse, and legal protection of infrastructure.
BCP does not create new laws—it creates a standard that enables the application of existing laws.

7.7 Legal conclusion

BCP is legally grounded because it:
relies on property rights, uses the doctrine of implicit consent, provides notice, provides evidence, aligns with international regulations, and defines the responsibility of bot operators.
BCP thus represents the first formal legal framework for regulating automated access.

8. Technical Specification of BCP

8.1 Bot identification

BCP does not prescribe a single method for identifying bots; instead, it defines standard criteria that can be implemented by any server or CDN.
A bot may be identified based on:

8.1.1 User agent analysis

Known search crawlers (Googlebot, Bingbot, Amazonbot), known scraping agents, AI model crawlers, suspicious or generic user agents, and forged user agents.

8.1.2 Request rate

More than X requests per second, more than Y requests per minute, or more than Z requests per hour.

8.1.3 Behavioral patterns

Accessing a large number of URLs in a short time, repeated access to the same URLs, accessing structured URL patterns (e.g., /page/1, /page/2, /page/3), or ignoring robots.txt.

8.1.4 Technical parameters

ASN, IP ranges, geolocation, TLS fingerprints, absence of JavaScript interaction.

8.1.5 Heuristic models

BCP allows the use of advanced methods such as:
AAADCS classification, machine learning, behavioral models, and combined metrics.
BCP does not require any specific method—it only requires that identification be documented and repeatable.

8.2 Bot Consent Page (BCP Page)

The BCP Page is the central element of the protocol.
It is the document presented by the server to automated visitors before allowing further access.

8.2.1 Requirements for the BCP Page

The BCP Page MUST be:
accessible via HTTP/HTTPS, static or dynamic, machine-readable, archived, and clearly marked as a BCP document.

8.2.2 Recommended structure

The BCP Page SHOULD contain:

Document title — “Bot Consent Protocol (BCP) — Terms of Use for Automated Access”
Version and date — e.g., “Version 1.0 — January 2026”
Terms of use — rules, limitations, requirements.
Pricing — standard prices per request type.
Legal basis — notice of implicit consent.
Technical limits — rate limits, allowed endpoints, prohibited patterns.
Contact — legal and/or technical contact details.

8.2.3 HTTP status codes

The BCP Page MAY be returned with:
HTTP 200 (recommended), HTTP 403 (if access is conditional), or HTTP 429 (if the bot is excessive).
BCP does not mandate a specific status code—it mandates that the document be clearly identifiable as the BCP Page.

8.3 Redirection mechanism

When the server identifies bot traffic, it MUST:
intercept the request, redirect to the BCP Page, log the display, and allow further access only after acceptance (implicit or explicit).

8.3.1 Acceptance of terms

Acceptance occurs when the bot:
continues making requests after the BCP Page has been displayed, does not terminate the session, does not change identity, and does not reduce its request rate to zero.
This constitutes implicit consent.

8.4 Implementation in different environments

BCP is designed to be implementable in:
nginx, Apache, Cloudflare, Node.js / Express, PHP / Python, and other environments.
BCP does not require any specific technology—it only requires that the implementation conforms to the protocol.

8.5 Sanctions

BCP defines standard sanctions for violations:
throttling (rate limiting), blocking (HTTP 403 or 429), shadow realm (isolated environment), cost recovery (billing), and legal notices.

8.6 Compatibility

BCP is compatible with:
robots.txt, sitemap.xml, API rate limiting, CDN rules, security systems, and AAADCS (optional).
BCP does not replace existing standards—it adds a regulatory layer that has been missing.

9. Implementation Example (Appendix)

This section provides a reference implementation of the Bot Consent Protocol (BCP) in real-world environments.
The examples are designed to be adaptable to any infrastructure, regardless of technology or scale.

9.1 Example Bot Consent Page (BCP Page)

Bot Consent Protocol (BCP)
Version 1.0 — January 2026

Terms of Use for Automated Access:

1. Continuing automated access after this document is displayed constitutes acceptance of these terms.
2. Automated systems MUST respect rate limits: maximum 1 request per second.
3. Scraping, bulk copying, or redistribution of content is prohibited without written permission.
4. Access for training or commercial use of AI models is only permitted subject to payment.

Pricing:
- Standard request: €0.01
- Excessive crawling: €0.05
- Scraping request: €0.10
- Ignoring BCP Page: €250 flat
- Forged user agent: €500 flat
- AI model access: €0.15 per request

Legal basis:
Continuing access constitutes implicit consent to these terms.

Contact:
legal@domain.com

9.2 Example redirection in nginx

map $http_user_agent $is_bot {
    default 0;
    "~*bot" 1;
    "~*crawler" 1;
    "~*spider" 1;
    "~*ai" 1;
}

server {
    location / {
        if ($is_bot) {
            return 302 /bcp;
        }
    }
}

9.3 Example redirection in Cloudflare Workers

export default {
  async fetch(request) {
    const ua = request.headers.get("User-Agent") || "";

    if (/bot|crawler|spider|ai/i.test(ua)) {
      return Response.redirect("https://domain.com/bcp", 302);
    }

    return fetch(request);
  }
}

9.4 Example implementation in Node.js (Express)

app.use((req, res, next) => {
  const ua = req.headers['user-agent'] || '';

  if (/bot|crawler|spider|ai/i.test(ua)) {
    return res.redirect('/bcp');
  }

  next();
});

9.5 Example shadow realm implementation

location /shadow {
    return 200 "OK";
}

if ($is_bot) {
    return 302 /shadow;
}

9.6 Example logging for billing

log_format bcp '$remote_addr - $http_user_agent - $request - $status - $body_bytes_sent - $request_time';
access_log /var/log/nginx/bcp.log bcp;

9.7 Example legal notice

“By continuing automated access after the Bot Consent Page (BCP Page) has been displayed, the entity operating the bot is deemed to have accepted the terms of use and pricing.
All requests are logged and may be used as evidence in legal proceedings.”

9.8 Implementation conclusion

BCP is designed so that it can be implemented by:
a small blog, a major media outlet, an e‑commerce platform, a SaaS provider, or an academic institution.
BCP does not require changes on the bot side—it only requires that bots respect the rules, just as human users must respect terms of service.

10. Conclusion

The web has entered an era in which automated systems are no longer an exception but a dominant force.
Bots, crawlers, scraping tools, and AI models generate a large share of traffic, influence the economics of content, shape publisher visibility, and act as key players in the digital ecosystem.
Yet they operate in an environment without rules, without responsibility, and without economic balance.

The Bot Consent Protocol (BCP) is the first attempt to establish a fair, transparent, and technically feasible standard that imposes on automated systems the same fundamental obligations as on human users: information, consent, and responsibility.

BCP:
introduces a formal mechanism for bot consent, enables access regulation, defines an economic model for automated traffic, provides a legal and technical audit trail, protects publisher infrastructure, enables transparency over content usage, and lays the foundation for a fair digital economy.

BCP is not intended to restrict innovation, but to restore balance.
It is not designed to block bots, but to ensure their responsible use.
It is not about punishment, but about compensating the costs that automated traffic imposes.

With this document, a space opens for:
further standardization, industry debate, collaboration between publishers and technology companies, development of tools that support BCP, integration with analytical standards such as AAADCS, and future regulatory frameworks.

BCP is a first step.
A first standard.
The first formal definition of bot responsibility.

If robots.txt was the symbol of the early, informal web, BCP is the symbol of a mature, responsible, and fair internet, where automated systems are no longer invisible, unbounded, and unaccountable.

BCP lays the foundation for a future in which:
publishers are no longer unpaid data suppliers, AI models do not grow on third-party infrastructure without compensation, the scraping industry is no longer a wild west, and automated traffic becomes regulated, measurable, and fair.

The Bot Consent Protocol (BCP) is the first step toward that future.

…..Slovensko……

Ogledov: 23.927

Pages: 1 2

2 Thoughts to “Bot Consent Protocol”

Globalna koalicija ustvarjalcev založnikov proti platformni neenakosti - hac

18. januarja, 2026 at 00:00

[…] Bot Consent Protocol – 1. standard, ki založnike postavlja v položaj trženja svojih vsebin […]

Odgovori
End of Google advertising: AdSense does not meet the conditions for displaying ads - hac

17. januarja, 2026 at 21:16

[…] BCP Standard – The future of publishing lies in the hands of authors and creators […]

Odgovori

Sorodne vsebine