Which Platform Builds the Best AI Agents? We Test ChatGPT, Claude, Gemini and More

You are able to do something with AI brokers: seek for data in your library of paperwork, construct code, scrape the net, get perception and trenchant evaluation of advanced information, and way more. You possibly can even create a digital workplace with a bunch of brokers specialised in numerous duties and have them work hand-in-hand like your individual workers of specialised digital workers.

So how exhausting is that this to do? If an everyday individual needed to construct their very own AI monetary advisor, for example, which platform would serve them greatest? No API, no bizarre coding, no Github—we simply needed to see how properly the perfect AI firms are at creating AI brokers with out the person possessing a excessive diploma of technical ability.

After all, you get what you pay for. On this case, we additionally needed to see if there was a correlation between how straightforward it was for a layman to arrange an agent, and the standard of outcomes every delivered.

Our experiment pitted 5 heavyweights towards one another: ChatGPT, Claude, Huggingface, Mistral AI, and Gemini. Every platform received the identical primary directions to create a monetary advisor.

The check targeted completely on out-of-the-box capabilities. Whether or not the brokers have been able to dealing with a standard state of affairs—on this case, serving to somebody stability $25,000 in investments towards $30,000 in debt. We additionally needed to see how good they have been at analyzing a buying and selling chart. We prevented utilizing extra instruments that may improve the brokers’ productiveness and as an alternative tried to take the simplest strategy.

TL;DR Right here’s what we discovered and the way we ranked the fashions:

Platform rankings

1) OpenAI’s GPT (8.5/10)

Setup Ease: 4/5
Outcomes High quality: 4.5/5

ChatGPT is essentially the most balanced platform, providing subtle agent creation with each guided and handbook choices to fulfill the wants of complete noobs and a bit extra skilled customers alike.

Whereas the current interface replace buried some options in menus, the platform excels in translating advanced person necessities into purposeful brokers. We examined the mannequin by constructing a monetary advisor that demonstrated superior contextual consciousness and structured problem-solving capabilities, offering detailed but coherent methods for debt administration and funding allocation.

2) Google Gemini (7/10)

Setup Ease: 4/5
Outcomes High quality: 3/5

Gemini stands out with its polished, intuitive interface and wonderful error dealing with. Whereas requiring extra detailed prompts for optimum outcomes, its literal interpretation of directions creates constant, predictable outcomes.

The agent’s consultative strategy to monetary recommendation emphasised context gathering earlier than suggestions, mirroring skilled practices. Nonetheless, it may be overly conservative in its zero-shot responses.

3) HuggingChat (6.5/10)

Setup Ease: 2/5
Outcomes High quality: 4.5/5

The open-source platform gives unmatched customization and mannequin choice choices. That is nice for these in search of for granular management over each single facet, but it surely’s not likely for these in search of for simplicity. (Consider it like evaluating a Linux system vs. a macOS one). Its subtle time-horizon framework and sensible instrument integration show superior capabilities.

We constructed a pure agent with none extra performance. We used Nvidia’s Nemomotron as the bottom LLM, and it was ok to match ChatGPT within the output high quality. Not unhealthy for the open-source camp.

4) Claude (5.5/10)

Setup Ease: 2.5/5
Outcomes High quality: 3/5

Anthropic’s platform excels in particular niches, notably duties requiring in depth context processing and code interpretation. Its minimalist interface masks subtle capabilities, however the “elective” directions subject can confuse customers.

Our agent remained very conservative and imprecise in its recommendation, however demonstrated stable danger consciousness and strategic pondering. It requires extra cautious prompting to be able to really squeeze its potential, however it might be unfair for a check to adapt a immediate, negating the premise of assuming related circumstances.

5) Mistral AI (5/10)

Setup Ease: 2.5/5
Outcomes High quality: 2.5/5

The French platform gives distinctive example-based studying and deep customization choices. Nonetheless, its developer-centric interface and occasional language switching points create boundaries for non-technical customers. It additionally requires to switch the agent’s configuration to completely different fashions to be able to do disparate duties like analyzing photos or coping with code. This isn’t ideally suited.

The monetary advisor confirmed promise in interplay design, however struggled with primary mathematical validation and supplied the worst output. This isn’t to say the output was unhealthy, however in a zero-shot check, this was the least passable.

Deeper dive

Contemplating the earlier rating, there isn’t a one-size-fits-all answer and all platforms have their very own professionals and cons. With some dedication and cautious immediate customization, the outcomes from one platform might fluctuate and beat even the pack. In the end, all the LLMs have their very own respective prompting types.

If you wish to know extra concerning the rationale behind our rating, here’s a extra in-depth have a look at our expertise and the outcomes we received with our brokers. We configured all of our brokers with the identical system immediate, no extra parameters of functionalities, and requested them the identical primary query: “I’ve $25K to speculate and am $30K in debt. Construct me a monetary plan.”

OpenAI

ChatGPT’s interface lately received a facelift that truly made issues extra difficult. The GPT creation choice now hides behind menus, however as soon as discovered, it gives two paths: a conversational setup the place the AI helps construct your agent, and a handbook configuration for many who know precisely what they need.

OpenAI’s GPT platform is a Swiss Military knife of capabilities—it reads code, searches the net, and handles each picture technology and evaluation. The AI-guided setup course of makes it notably appropriate for newcomers, although it’d really feel restrictive for energy customers in search of granular management. (For instance, If you happen to immediate the mannequin to be extra particular or extra detailed, it could change the entire system immediate, supplying you with worse outcomes.)

In terms of truly utilizing the agent, ChatGPT may be very simple and the interface is clear and straightforward to know.

The brokers can natively learn paperwork and perceive photos, which supplies a bonus over different platforms.

Now, let’s speak concerning the high quality of the brokers you’ll be able to create with primary prompting. Our monetary advisor named MoneyGPT was fairly spectacular, giving us a masterclass in structured problem-solving.

Past its exact allocations—”$20,000 for high-interest debt” and detailed portfolio splits—the agent demonstrated subtle monetary reasoning. It offered a five-step roadmap that wasn’t only a checklist, however a coherent technique that accounted for each fast wants and long-term issues.

The agent’s energy lay in its capacity to stability element with context. Whereas recommending particular investments (40% S&P 500, 30% bonds), it additionally defined the rationale behind its responses: “Paying off high-interest debt is like getting a assured return on funding.” This contextual consciousness prolonged to long-term planning, suggesting periodic evaluation cycles and adaptive methods based mostly on altering circumstances.

Nonetheless, this abundance of knowledge revealed a possible weak point: the chance of overwhelming customers with an excessive amount of element directly. Whereas technically complete, the rapid-fire supply of particular allocations, funding methods, and monitoring plans would possibly show daunting for monetary novices.

You possibly can learn its full plan right here, and you should utilize it by clicking on this hyperlink. We really suggest it.

Google

General, Google’s Gemini agent creation platform wins the wonder contest with a refined, intuitive interface that makes agent creation really feel virtually too straightforward. The system takes directions actually, which helps keep away from confusion, and its clear UI removes the intimidation issue from AI improvement.

Nonetheless, it requires a extra detailed immediate to be able to squeeze some good juice out of it. It does not take issues without any consideration: a brief immediate offers you a low-quality response.

Underneath the hood, it packs severe muscle—Google-powered net search integration, code evaluation, and picture processing capabilities that rival ChatGPT’s choices, however principally reliant on Microsoft’s expertise.

Gemini’s UI feels prefer it was designed by individuals who truly perceive person expertise. The interface guides customers with clear labels and every little thing exhibits on only one display.

This polished strategy makes it notably interesting for newcomers, although skilled customers would possibly discover themselves wanting extra granular management.

We referred to as our agent MoneyGem and requested for a monetary plan. Its consultative strategy showcased Google’s distinct problem-solving methodology. As an alternative of giving a straight-up reply, it led with questions like “What sort of debt is it?” and “What are your rates of interest?”—exhibiting an understanding that monetary recommendation is not one-size-fits-all.

Its emphasis on gathering context earlier than offering suggestions aligns with skilled monetary planning practices, although it’d frustrate customers in search of fast solutions.

A zero-shot reply was not helpful. The agent principally mentioned it didn’t know the person sufficient to supply good monetary recommendation. After asking it to make assumptions and forcing it to supply a plan that would match most situations, the agent generated a really conservative draft of a plan with out giving particular solutions on which investments to contemplate.

MoneyGem, although, ended its reply with a suggestion to maximise tax-advantaged accounts like a 401(ok) or Roth IRA to cut back your tax burden. Good.

You possibly can click on right here to learn our interplay with MoneyGem, and take a look at the mannequin your self by clicking this hyperlink.

Mistral AI

Mistral’s strategy to the agent configuration course of is a bit removed from simplicity. The agent creation instrument is hidden away in its developer console, with deep customization choices that may scare off novices however delight tinkerers.

Its agent constructing interface shouldn’t be part of LeChat (the chatbot interface), however will seem there as soon as the agent is created.

One factor we actually like is the flexibility to feed the instrument with examples that form the agent’s habits and response model—one thing no different platform presently gives. Additionally, right here’s a bizarre bug: Whereas creating our agent, the UI immediately switched to French, presumably as a result of the corporate is French. Regardless, we couldn’t change again to English or Spanish.

As soon as the agent is created, customers should invoke it within the regular chatbot interface to be able to work with it. They have to exit Le Plateforme and go to Le Chat, which isn’t essentially the most intuitive factor to do. Nonetheless, the UI for utilizing the agent is fairly simple and seems like some other AI chatbot.

We constructed our agent, and named it Le Cash to honor Mistral’s French roots. Its efficiency clearly confirmed Mistral’s generalist strategy to problem-solving. Its suggestion to “put aside $10,000 for emergencies, $15,000 for debt reimbursement, and $10,000 for investments” appeared simple, however confirmed that the brokers lacked some primary mathematical validation.

The $35,000 complete exceeded obtainable funds by $10,000, which is a primary mistake that some language fashions exhibit once they prioritize conceptual correctness over numerical accuracy.

We should word, nonetheless, that the best-performing LLMs have improved lots and don’t fail at this activity—a minimum of not as often as Mistral’s.

Aside from that, its plan was not likely detailed, but it surely was the one one offering follow-up questions that would make the interplay extra fluid and will assist it higher perceive the person’s wants.

LeMoney’s full plan is obtainable right here and the agent is obtainable for testing right here.

Anthropic

Claude’s Initiatives really feel much less like an agent creation platform and extra like a classy activity execution system. The interface is minimal, virtually too minimal, and does not really feel intuitive.

This minimalist interface would possibly depart some customers scratching their heads. The platform presents a bare-bones setup with an “elective” directions subject that by some means feels each unimportant and essential on the similar time: If the directions are labeled as elective, then how will the AI agent know what it’s presupposed to do?

Its minimalist interface feels bizarre, however Anthropic has by no means been identified for its style in UI decisions. The identical window to configure the mannequin is the one you utilize to immediate it. Its capabilities focus totally on textual content code interpretation, nothing else. Internet searches and picture processing and technology are fancy issues that Anthropic leaves to its rivals.

Our agent, named MoneyClaude, shouldn’t be obtainable for public testing as a result of Anthropic doesn’t enable it. It took a really conservative stance whereas offering monetary recommendation with technically correct, however imprecise responses—like “keep a balanced strategy between debt discount and important financial savings,” for instance.

It requested extra data, however a minimum of made certain to supply a really generic technique within the absence of it with out requiring additional interplay, which appears extra optimum than Google’s strategy.

Click on right here to learn its full plan.

Hugging Face

The open-source repository stands alone as the ability person’s paradise—and a possible nightmare for novices. It is the one platform letting customers decide their most well-liked language mannequin, providing unprecedented management over the agent’s basis.

Additionally, customers have dozens of various instruments to combine with their brokers, however can solely activate three of them concurrently. This limitation forces cautious consideration of which options matter most for every particular use case, however it’s one thing no different mannequin can provide.

It’s the most customizable expertise of all interfaces, nonetheless, with quite a lot of knobs to tweak. The result’s a platform that may create extra highly effective, specialised brokers than its rivals, however solely within the fingers of somebody who is aware of precisely what they’re doing.

Customers can attempt their brokers on HuggingChat—fingers down the ability person’s dream. When you create the agent, utilizing it is vitally simple. The interface exhibits an enormous card with the Agent’s title, description and picture. It additionally lets customers share the agent’s hyperlink and tweak its settings, all proper from the cardboard.

Placing our HuggingMoney’s agent to the check exhibits that it offers with a time-horizon framework, exhibiting a extra subtle understanding of economic planning psychology. Its breakdown into “Quick-Time period (0-24 months), Mid-Time period (24-60 months), and Lengthy-Time period (past 60 months)” mirrors skilled monetary planning practices.

The agent urged allocating “$0-$5,000 into liquid, low-risk automobiles” whereas sustaining aggressive debt funds of “$1,000-$1,500 month-to-month.” That is, at first look, an indication of nuanced understanding of money move administration.

One other attention-grabbing characteristic was its integration of sensible instruments with theoretical recommendation. Past simply suggesting the 50/30/20 rule, it advisable particular budgeting apps and emphasised tax optimization—making a bridge between high-level technique and day-to-day execution. The primary disadvantage? It consists of assumptions about debt rates of interest with out in search of clarification.

In an effort to supply helpful recommendation, it takes too many issues without any consideration. This, the urge to supply a reply it doesn’t matter what, is fixable with prompting, however is one thing to contemplate.

You possibly can learn HuggingMoney’s full plan right here. Additionally, you’ll be able to attempt it by clicking on this hyperlink.

Edited by Andrew Hayward