Welcome to Rule-based Retrieval Documentation¶

The Rule-based Retrieval package is a Python package for creating Retrieval Augmented Generation (RAG) applications with filtering capabilities. It leverages OpenAI for text generation and Pinecone for vector database management.

Key Features¶

Easy-to-use API for creating and managing Pinecone indexes
Uploading and processing documents (currently supports PDF files)
Generating embeddings using OpenAI models
Querying the index with custom filtering rules
Retrieval Augmented Generation for question answering
Querying the index with custom filtering rules, including processing rules separately and triggering rules based on keywords

Getting Started¶

Install the package by following the Installation Guide
Set up your OpenAI and Pinecone API keys as environment variables
Create an index and upload your documents using the Client class
Query the index with custom rules to retrieve relevant documents, optionally processing rules separately or triggering rules based on keywords
Use the retrieved documents to generate answers to your questions

For a detailed walkthrough and code examples, check out the Tutorial.

Architecture Overview¶

The Rule-based Retrieval package consists of the following main components:

Client: The central class for managing resources and performing RAG-related tasks
Rule: Allows defining custom filtering rules for retrieving documents
PineconeMetadata and PineconeDocument: Classes for representing and storing document metadata and embeddings in Pinecone
embedding, processing, and exceptions modules: Utility functions and custom exceptions