Standardizing LLM ↔ Business Communication: a Technical Proposal

    The future is agentic. A technical proposal on how to fully realize the potential of agentic online transactions.

    TL;DR

    With the recent advancements in LLM technology, a new era of the web is being created — the agentic era of the web. While currently AI agents use human-like tools such as headless browsers, there is huge potential for improving the efficiency and effectiveness of AI Agents. One way to achieve this is by fully embracing MCP Servers to standardize communication between LLMs and applications. This article proposes a flow of how LLMs could leverage MCP Servers to interact with applications efficiently.

    Intro

    With LLM technology and AI progressing at a rapid pace and a steadily growing user base, the question of how these LLMs will interact with existing websites and businesses in a standardized way remains unclear.

    For reading websites, we already have sophisticated methods such as web crawling with emerging services like Firecrawl or ScrapeGraphAI.


    But what about interacting with websites?


    OpenAI released Operator in January 2025. It is a tool that allows the LLM to execute tasks in a browser-based environment on behalf of the user. While this is progress towards an agentic future, it also comes with problems.


    Browsers are built for human-computer interaction.


    Web browsers are not built for computer-to-computer communication. They are visual so that human beings can easily interact with the website. Humans use tools such as a mouse, a keyboard, and a screen. A computer does not need all this. A computer can communicate with another computer purely by defined standards such as REST APIs.

    Problems With Building On Top Of Headless Browsers

    While giving an LLM exactly the tools that human beings have and trying to mimic human behavior sounds promising, there are many challenges that make this endeavor complicated.

    1. LLMs are bots from the website perspective

    When was the last time you filled out a captcha to prove that you’re human? Yes, website providers actively try to eliminate bot traffic from their websites in many different ways. Captchas are just one of them.

    1. Forms are not always understood by bots

    While a well-optimized, modern website uses naming conditions and HTML structures that make it easy for bots and screen readers to understand the form elements of a web page, a large share of websites are not optimized in this way. This leads to the bot making mistakes while filling forms, which in turn leads to user frustration with agentic AI.

    1. Using screenshots and headless browsers is inefficient

    As of now, OpenAI’s Operator interacts with the website by taking screenshots of it, processing the screenshots, and then interacting with website elements such as form fields or buttons by imitating to “click” on them. Each click takes time, and because the website needs to react, a new screenshot needs to be taken and processed. This consumes time and lots of computing power.

    This is exactly mimicking human behavior. If you compare humans with computers, humans are generally slow and inaccurate. This is because we do not process data input like computers. Computers understand numbers, they understand data. Why are why trying to give computers eyes and fingers?


    If we go back in history and look at why we started using computers such as calculators in the first place, we find that we use them because they can complete certain tasks more efficiently and accurately than us. Why would we limit an LLM to the tools that human beings have, when there are alternatives?

    A Solution

    While some services like Browserbase fully embrace the headless browser solution, other proposed standards, such as Model Context Protocol (MCP), have been making waves for solving LLM to business/website communication. While being proposed by Amazon-backed Anthropic, MCPs were already endorsed and incorporated by many big players in the AI space, such as Google’s DeepMind and OpenAI.


    MCP proposes a standard on how LLMs can interact with applications. On a technical level, you can wrap your REST API in an MCP Server and have an LLM interact with it. While OpenAI and Anthropic have yet only announced support for their Desktop clients, the MCP standard has enormous potential.


    By using MCP Servers as the standard of LLM-to-application communication, all the above problems with headless browsers can be solved.

    1. MCP Servers are MADE for bots

    Legitimate bot traffic on MCP Servers, for example from OpenAI, will never require strange workarounds to avoid captchas, because MCP Servers are built especially for LLM bots as their clients.

    1. Forms will be understood effortlessly

    MCP Servers provide an input schema to the LLM so that the LLM knows exactly which parameters to provide to successfully submit a given form.

    1. No more screenshots or headless browsers

    MCP Servers are similar to REST APIs, as they define an interface for one application to interact with another. Once an LLM is connected to an MCP Server, it can use the provided tools directly without ever interacting with the website through a headless browser. Only the data that is really needed is exchanged, no need for loading unnecessary HTML structures or screenshot-based navigation.

    A Caveat

    While MCP is emerging as an industry standard, there is another question to answer.


    Say you are a Shopify-based e-commerce store, that wants to host an MCP Server. How do LLMs find your MCP Server?

    Currently, they don’t, which is why we are writing this post.

    Currently, MCP Servers can only be configured directly by the LLM user. Standard LLMS will never use public MCP Servers automatically.

    The Proposal

    We would like to propose a very simple, standardized flow for LLMs to discover the address of an MCP Server that is attached to any given application. This would allow LLMs to automatically check for existing MCP Servers and start communicating with the app directly via a defined set of tools.

    While MCP is a proposed standard to let LLMs interact with applications, there is another proposed standard that we would like to integrate into this flow, the llms.txt file.

    Similar to the already widely accepted robots.txt file, the llms.txt file can be used to give context to an LLM at inference time. Just mentioning in the file that there is an MCP Server available via a given URL will give the LLM the ability to interact with the MCP Server instead of using a headless browser.

    A diagram proposing a simplified, technical flow of interaction between public applications and LLMs

    In the above diagram, this flow of interaction between LLM and application via LLMs.txt and MCP Server is explained in a simplified way.

    Example Flow

    Let’s assume you are looking for a new pair of sneakers. So you query your LLM of choice “Find new white sneakers that are trendy right now. Less than $100”.

    The LLM will then search only for different online stores that sell sneakers. For a given store, the LLM would then check example.com/robots.txt file or example.com/llms.txt file and find the address of the MCP Server. The LLM would connect to the MCP Server, and find the tools that are available on this server such as filter_products, add_product_to_cart, and prefill_checkout. The LLM could then go ahead and use these tools, to filter for sneakers that are white and cost less than $100. With an optimized MCP Server available, different checkout links could be generated and automatically filled out with user information that is saved on the LLM context from previous user requests.

    The user would then receive an answer such as “Sure, here are two different pairs of white sneakers below $100. Here are the pre-filled checkout links”.

    A diagram proposing a simplified, technical flow of interaction between public applications and LLMs

    Conclusion

    We hope that this proposal will inspire LLM providers and application owners alike to look for ways to leverage AI technologies to improve the user experience for consumers. By embracing this flow, we will be one step closer to a truly agentic era of the web, where a myriad of online transactions can be executed effortlessly by AI agents.