As large language models (LLMs) continue to evolve, their parameter counts grow exponentially, with some models reaching trillions of parameters. This exponential growth presents significant challenges for deployment on edge devices and in resource-constrained environments due to extensive memory and computational requirements. Quantization emerges as a crucial technique to reduce model footprint while preserving acceptable performance.

Quantization for large language models

A technical case study detailing the implementation of an AI chatbot agent in a project management platform. Learn how the team leveraged LangChain, LangGraph, and GPT-4 to build a multi-agent system using the supervisor-worker pattern.

Building chatbot agent to streamline project management

In baseline Retrieval Augmented Generation (RAG), sometimes the result might not be accurate as expected since the query itself have multiple layers of reasoning or the answer requires traversing disparate pieces of information through their shared attributes in order to provide new synthesized insights. In this post, we will explore a new approach called GraphRAG which combines the strengths of knowledge graphs and large language models to improve the accuracy of RAG systems

GraphRAG - building a knowledge graph for RAG system

Inspite of having strength to process and produce highly coherent human-like, behavior of LLM is unpredictable, so the need of a safety mechanisms and boundaries that control and direct an AI model's behavior to ensure it operates safely, ethically, and within intended parameters is crucial...

Guardrails in llm

In the process of improving Large Language Model (LLM) performance, many techniques have been proposed. The Augmented Language Model (ALM) approach boosted LLM accuracy by enabling the attachment of external sources to enhance the model's knowledge. However, ALMs still had limitations in terms of time consumption and token resources. To address these issues, ReWOO was developed as a more efficient solution.

ReWOO: Reasoning without observation - a deeper look

Working with Large Language Models (LLMs) may seem straightforward at first glance. Initially, we might think that simply prompting the AI model with a task is sufficient for it to assist us. However, when faced with more complex tasks, we need to break down the problem into smaller sub-tasks and guide the AI model to solve them sequentially. This is where the ReAct approach comes into play.

ReAct(reason + act) in LLM

User intent classification is a crucial aspect of conversational AI, start with machine learning models, but now advanced language models (LLMs) are being explored for this task. Unlike the old methods which is need to labeled datasets exhaustively, LLMs can understand what users mean without all that preparation. This memo explores the application of LLMs in intent classification, highlighting their potential to streamline the process and overcome traditional NLU limitations.

Intent classification by LLM

With the robust growth of LLM models currently, there is a new method is used to evaluate the performance of large language models (LLMs): LLM-as-a-Judge, also known as LLM-evaluators. This approach take adavantages of other advanced language models to assess the quality and effectiveness of responses generated by other LLMs.

LLM as a judge

Nowadays, Large Language Models (LLMs) have become integral to various applications. However, with great power comes great responsibility, and the rise of LLMs has introduced new security challenges. One such challenge is prompt injection attacks, a sophisticated technique that can manipulate AI systems to perform unintended actions. In this article, we'll dive deep into the world of prompt injection, understand its implications, and explore strategies to prevent these attacks.

Prevent prompt injection

When building a chatbot agent, it's important to evaluate its performance and user satisfaction. One effective method is user simulation, which involves creating virtual users to interact with the chatbot and assess its responses. This approach allows for a more realistic evaluation of the chatbot's capabilities and user experience.

Evaluate chatbot agent by user simulation

In AI integrated systems, instead of put all workload on a single agent, we can apply divide and conquer strategy to distribute workload to multiple agents. This approach can enhance task completion by leveraging the unique skills and capabilities of each agent.This approach allows for more complex and nuanced problem-solving, as well as increased efficiency and scalability. By coordinating and communicating effectively, agents can work together to achieve common goals, divide labor, and overcome challenges that a single agent might face alone

Multi-agent collaboration for task completion

Re-ranking is a crucial step in Retrieval-Augmented Generation (RAG) systems that addresses the challenge of retrieving heterogeneous and potentially irrelevant information. By evaluating and re-ordering retrieved documents, re-ranking ensures that only the most relevant and useful information is passed to the generation model. This process significantly improves the coherence, accuracy, and relevance of the generated text, ultimately enhancing the reliability and effectiveness of RAG systems.

Re-ranking in RAG

In spite of having taken the world by storm, Large Language Models(LLM) still has some limitations such as limited context window and a knowledge cutoff date. Retrieval-Augmented Generation(RAG) steps in to bridge this gap by allowing LLMs to access and utilize external knowledge sources beyond their training data. However, data is not text based only, it also can be image, audio, table in docs,...

Multimodal: in rag

A Bloom filter is a probabilistic data structure used for testing whether an element is a member of a set or not. It's space-efficient compared to other data structures like hash tables, but it may give false positives (indicating that an element is in the set when it's not) and never gives false negatives (indicating that an element is not in the set when it actually is not)...

Bloom filter

The contributor 'nnhuyhoang' demonstrates strong technical depth, particularly in AI, LLMs, and system integration. Articles provide clear implementation details, code snippets, and practical system design, showcasing a high level of technical competence (Technician: 8.7). There is evidence of process optimization and operational awareness, especially in articles about evaluation, guardrails, and prompt injection, supporting a solid Operator score (7.0). The author shows some advisory and solution architecture skills, with explanations suitable for knowledge transfer and requirements gathering, though client-facing and strategic advisory aspects are less visible (Consultant: 7.2). Product-building and innovation are present in case studies and technical explorations, but entrepreneurial market orientation is not deeply covered (Builder: 7.5). Managerial indicators such as leadership, team coordination, and decision-making are limited or implicit, resulting in a modest Manager score (5.0). Writing style is clear and focused on technical audiences. Overall, the author is a strong technical contributor with some strengths as a builder and operator, but less evidence of leadership or high-level consulting.