Dwarves
Memo
Type ESC to close search bar

GraphRAG - Building a knowledge graph for RAG system

In baseline Retrieval Augmented Generation (RAG), sometimes the result might not be accurate as expected since the query itself have multiple layers of reasoning or the answer requires traversing disparate pieces of information through their shared attributes in order to provide new synthesized insights. In this post, we will explore a new approach called GraphRAG which combines the strengths of knowledge graphs and large language models to improve the accuracy of RAG systems.

What is Knowledge Graph?

A knowledge graph is an organized representation of real-world entities and their relationships. It is typically stored in a graph database, which natively stores the relationships between data entities. Entities in a knowledge graph can represent objects, events, situations, or concepts. Knowledge graphs contain 2 key chracteristics:

Why Knowledge Graph is used in RAG?

Naive RAG systems built with keyword or similarity search-based retrieval fail in complex queries that require reasoning. Suppose user asks a query: “What is the favorite food of Taylor Swift’s cat?”, a standard RAG system will search for documents containing keywords like “Taylor Swift”, “cat”, and “favorite food”. It might find separate documents about Taylor Swift’s pets or about cat foods because it cannot connect the dots in a logical sequence However, taking advantage of knowledge graph, the ideally process will be: Taylor Swift has a cat named Benjamin Button, then it looks for information about Benjamin Button’s preferences. Finally, it finds out that Benjamin Button’s favorite food is tuna.

How GraphRAG works?

GraphRAG workflow contain 2 main stage: Index and Query.

Index

Indexing in GraphRAG is data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using LLMs. Following above diagram, Index stage contain 6 main steps:

Entity ExampleRelationship Example

Query

Query stage is the process of answering a question using the graph and the summaries of the communities. The query has 2 mode: Local Query and Global Query.

Following above diagrams, the user query will be extracted entities. Then, these entities will be semantic-searched though knowledge graph to find relevant informations. Then it flow to some filter and sorting steps to get the final answer.

In this mode, the collections of communiites will be used to generate response to user query in a map-reduce manner. At the Map step, community reports are segmented into text chunks of pre-defined size. Each text chunk is then used to produce an intermediate response containing a list of point, each of which is accompanied by a numerical rating indicating the importance of the point. And in Reduce step, the intermediate responses will be filtered and re-ranking and then aggregrated to produce the final answer.

Conclusion

GraphRAG is ideal for tackling complex tasks such as multi-hop reasoning and answering comprehensive questions that require linking disparate pieces of information. However, using a lot of LLM calls in both index and query stage make it expensive and should be in consideration.

References