The introduction of ChatGPT has brought large language models (LLMs) into widespread use in both tech and non-tech industries. This popularity is mainly due to two factors:
- LLMs as Knowledge Storehouse: LLMs are trained on a large amount of Internet data and are updated at regular intervals (i.e. GPT-3, GPT-3.5, GPT-4, GPT-4o, and others). ;
- Emergent Capabilities: As LLMs grow, they emerge. Abilities Not found in smaller models.
Does this mean we have already reached human-level intelligence, which we call Artificial General Intelligence (AGI)? Gartner explained. AGI as a form of AI capable of understanding, learning and applying knowledge across a wide range of tasks and domains. The road to AGI is long, with a major obstacle being the auto-regressive nature of LLM training that predicts words based on past sequences. As one of the pioneers of AI research, Yann LeCun Indicates that LLMs Self-reflexive nature may deviate from correct answers. Consequently, LLMs have several limitations:
- Limited Information: While training on vast data, LLMs lack up-to-date knowledge of the world.
- Limited reasoning: LLMs have limited reasoning ability. As Subarao Kumhampati points out. LLMs are good knowledge getters but There are no good arguments.
- No dynamism: LLMs are static and unable to access real-time information.
To overcome the challenges of LLM, a more innovative approach is needed. This is where agents become important.
Agent to the rescue
The concept of Intelligent agents in AI Developed over two decades, implementation changes over time. Today, agents are discussed in the context of LLM. Simply put, an agent is like a swiss army knife for LLM challenges: it can help us reason, provide sources of up-to-date information from the Internet (solving dynamic problems with LLM ) and can achieve a task autonomously. As the backbone of LLM, an agent formally consists of tools, memory, reasoning (or planning) and action components.
Components of AI agents
- Tools enable agents to access external information – whether from the Internet, databases, or APIs – allowing them to gather the necessary data.
- Memory can be short term or long term. Agents use scratchpad memory to temporarily store results from various sources, while chat history is an example of long-term memory.
- Reasoner allows agents to think methodically by breaking down complex tasks into manageable subtasks for efficient processing.
- Action: Agents take actions based on their environment and reasoning, adapting and solving tasks through feedback. ReAct is one of the most common ways to iteratively execute reasoning and actions.
What are agents good at?
Agents specialize in complex tasks, especially when a Role playing mode, leveraging the improved performance of LLMs. For example, when writing a blog, one agent can focus on research while another handles the writing – each Specific sub-goal. This multi-agent approach is applicable to a variety of real-life problems.
Role-playing helps agents focus on specific tasks to achieve larger goals, clearly reducing delusions. Description of parts Immediacy – such as roles, instructions and context. Since LLM performance depends on well-developed indicators, various frameworks formalize the process. One such framework, CrewAI, provides a systematic approach to defining role-playing, as we discuss next.
Multi-Agent vs. Single-Agent
Take the example of recovery augmented generation (RAG) using a single agent. This is an effective way to empower LLMs to handle domain-specific queries by leveraging information from indexed documents. However, the sole agent RAG comes with its limitations.such as retrieval performance or document classification. Multi-agent RAG overcomes these limitations by employing specialized agents for document understanding, retrieval and classification.
In a multi-agent scenario, agents collaborate in different ways, similar to distributed computing patterns: sequential, centralized, decentralized or shared message pools. Frameworks like CrewAI, Autogen, and langGraph+langChain enable solving complex problems with a multi-agent approach. In this article, I’ve used CrewAI as a reference framework to explore autonomous workflow management.
Workflow Management: A Use Case for Multi-Agent Systems
Most industrial processes are about managing workflows, whether it’s loan processing, marketing campaign management or DevOps. Steps, sequential or cyclical, are required to achieve a particular goal. In the traditional approach, each step (say, verifying a loan application) requires a human to manually process each application and perform the cumbersome and cumbersome task of verifying them before proceeding to the next step. .
Each step requires input from an expert in that area. In a multi-agent setup using CrewAI, each step is handled by a crew of multiple agents. For example, in verifying a loan application, one agent may verify the customer’s identity through a background check on documents such as a driver’s license, while another agent verifies the customer’s financial details.
This begs the question: Can one staff (with multiple agents arranged or ranked) handle all stages of loan processing? As far as possible, this complicates the task, requiring extensive temporary memory and increasing the risk of goal deviation and hallucinations. A more efficient approach is to treat each loan processing step as a separate staff, representing the entire workflow as a graph of staff nodes (using tools such as Lang graphs) sequentially or cyclically. have to do
Since LLMs are still in the early stages of intelligence, managing a complete workflow may not be completely autonomous. End-user validation requires a human in the loop at key stages. For example, after staff complete the loan application verification stage, human supervision is necessary to validate the results. Over time, as confidence in AI grows, some initiatives may become fully autonomous. Currently, AI-based workflow management acts in a supporting role, streamlining difficult tasks and reducing overall processing time.
Production challenges
Bringing a multi-agent solution to production can present several challenges.
- Scale: As the number of agents increases, collaboration and management become difficult. Different frameworks offer scalable solutions – for example, Llamaindex takes an event-driven workflow. To manage multiple agents at scale.
- Latency: Agent performance is often delayed because tasks are performed repeatedly, requiring multiple LLM calls. Managed LLMs (such as GPT-4o) are slow due to implicit guardrails and network latency. Self-hosted LLMs (with GPU control) come in handy to solve latency issues.
- Performance and illusion issues: Due to the probabilistic nature of LLM, agent performance may vary with each execution. Techniques such as output templating (eg, JSON format) and providing ample examples in prompts can help reduce response variability. The problem of hallucinations can be further reduced. By training agents.
Final thoughts
As Andrew Ng pointed out.Agents are the future of AI and will continue to evolve alongside LLMs. Multi-agent systems will advance in processing multimodal data (text, images, video, audio) and handling increasingly complex tasks. While AGI and fully autonomous systems are still on the horizon, multi-agent LLMs will bridge the current gap between LLMs and AGI.
Abhishek Gupta I am Principal Data Scientist. Talentka Software.
Credit : venturebeat.com