As AI Agents become the primary way users execute tasks on the Internet, they will need to navigate web interfaces without being flagged as malicious bots.
But what exactly are AI Agents?
While there has been a lot of debate over the exact definition, on a fundamental level, agents are AI systems which can reason and take actions (ReAct framework) on behalf of users, as a response to a high level goal defined by the user.
What do I mean?
Right now, you can ask ChatGPT to give you the recipe of a dish or draft an email for a marketing campaign, but will have to yourself take actions & order ingredients from Amazon or use a tool like MailChimp for setting up a mass email campaign.
With agents, you just give a broad-level goal and it goes out in the web and gets those things done for you.
You can simply say, give me the recipe for ‘Spaghetti Bolognese’ and order relevant ingredients from Amazon & the agent will go out into the web on your behalf & order the ingredients from your Amazon account without you having to click another button.
While there have been hundreds of such agents which went viral earlier this summer, most popular being AutoGPT and BabyAGI among others, the reality is the existing nature of the web makes it very tough for agents to reliably perform tasks on behalf of users.
Sequoia Capital released a pretty good blog on autonomous agents earlier this year if you want to further understand this landscape. Even Bill Gates recently wrote a **blog** proclaiming agents will change how users interact with computers.
While the initial agent demos showed outlandish goals like ‘Increase my net worth’ or ‘Make the world a better place’, it’s safe to say we haven’t achieved that level of AGI yet.
But tasks like sending an email or ordering food from UberEats seem to perform well on tools like HyperWrite’s Personal Assistant.
However, these agents are still quite unreliable and struggle to navigate websites effectively, mainly due to the unstructured and complex nature of the Internet.
With Open AI recently releasing ‘Custom GPT’s’, which essentially enables anyone to spin up an AI agent & have it perform tasks on the web, the number of bots on the Internet is going to exponentially increase.
Open AI announced a new ‘App Store’ of sorts for custom chatbots called ‘GPT’s’
However, websites are built for humans to interact with them, often leading to AI agents getting stuck in a loop and not being able to execute tasks effectively. Here is an example of a popular AI Agent getting stuck on travel platform Expedia.
Because majority of the web is unstructured, existing AI agent companies try and solve this problem in one of the following ways: