Chunking Strategy¶

Overview¶

A chunking strategy is the method used to divide documents into smaller, retrievable text segments before generating embeddings in a Retrieval-Augmented Generation (RAG) pipeline. The primary goal of a chunking strategy is to balance context preservation, retrieval accuracy, and system performance when searching knowledge bases.

In RAG systems, large documents are first split into chunks, converted into vector embeddings, and stored in a vector index. During query time, the most relevant chunks are retrieved and passed to the language model as context. Proper chunking directly affects retrieval quality, response accuracy, latency, and cost. (docs.aws.amazon.com) Scope:

(1) Applies to text-based knowledge retrieval systems (2) Used in vector databases, semantic search, and RAG pipelines (3) Independent of specific embedding or LLM providers (4) Applicable across enterprise, technical, and content-driven datasets

Chunking Strategies¶

The following strategies are commonly used in RAG systems.

Strategy	Description	Complexity	Best For
Fixed-Size	Splits text by token or character count	Low	Small or simple documents
Recursive	Repeatedly splits text while preserving structure	Low–Medium	Semi-structured text
Document-Based	Splits at document or section boundaries	Low	Structured files with clear sections
Semantic	Splits by meaning or topic boundaries	Medium	Technical or narrative content
LLM-Based	Uses a language model to determine chunk boundaries	High	Complex documents
Agentic	AI agent decides chunking strategy dynamically	Very High	Highly nuanced or regulatory text
Late Chunking	Embeds full document, then derives chunks	High	Context-dependent tasks
Hierarchical	Multi-level chunking (section → paragraph → sentence)	Medium	Large structured documents

Core Concepts and Components¶

Concept	Description	Responsibility
Chunk	A segment of text stored in the vector index	Retrieval unit
Chunk Size	Maximum token or character length per chunk	Controls context and latency
Overlap	Shared tokens between adjacent chunks	Preserves context continuity
Embedding	Vector representation of chunk text	Enables similarity search
Vector Index	Storage for chunk embeddings	Supports fast retrieval
Metadata	Attributes attached to chunks	Enables filtering and traceability

RAG Chunking Execution Flow¶

(1) Load the source document into the ingestion pipeline. (2) Apply the selected chunking strategy to split the document into chunks. (3) Generate embeddings for each chunk. (4) Store embeddings and metadata in the vector index. (5) Receive a user query. (6) Convert the query into an embedding. (7) Retrieve the most similar chunks from the vector index. (8) Send retrieved chunks as context to the language model. (9) Generate the final response using the retrieved context.

Strategy Selection Guidelines¶

Chunking strategy decision checklist

Context: Selecting a chunking strategy directly affects retrieval accuracy and system performance. Actions: (1) Identify document structure (structured, semi-structured, unstructured). (2) Estimate average document length and variability. (3) Determine whether semantic boundaries are important. (4) Evaluate latency and cost constraints. (5) Select the simplest strategy that meets retrieval quality requirements.

Decision Rules¶

(1) Use Fixed-Size when speed and simplicity are the priority. (2) Use Recursive when documents have paragraphs or headings. (3) Use Document-Based when files already have clear sections. (4) Use Semantic when topic boundaries matter. (5) Use Hierarchical for large, structured manuals or policies. (6) Use LLM-Based when chunk boundaries require contextual understanding. (7) Use Late Chunking when full-document context is critical. (8) Use Agentic only for highly complex or domain-specific documents.

Chunking Recommendations by Document Type¶

Document Type	Characteristics	Recommended Strategy	Notes
Financial reports	Dense numeric data, sections, tables	Document-Based or Hierarchical	Preserve section boundaries
Presentations (slides)	Short, independent slides	Document-Based	Treat each slide as a chunk
Legal contracts	Long, structured clauses	Hierarchical or LLM-Based	Maintain clause context
Technical documentation	Headings, code blocks	Recursive or Hierarchical	Preserve logical structure
Research papers	Topic-driven sections	Semantic	Split by topic shifts
Emails and chat logs	Short, independent messages	Fixed-Size or Document-Based	Treat each message as a chunk
FAQs and support articles	Short question-answer pairs	Document-Based	One chunk per Q&A
Product catalogs	Repetitive, structured entries	Document-Based	One chunk per product
Policies and compliance docs	Multi-section structured text	Hierarchical	Preserve section hierarchy
Meeting transcripts	Long conversational flow	Semantic or Recursive	Preserve topic transitions

Operational recommendation

Context: Mixed document collections. Actions: (1) Classify documents by type during ingestion. (2) Apply a different chunking strategy per document category. (3) Store the chosen strategy as metadata for observability.

Limitations and Edge Cases¶

(1) Very small chunks may lose semantic context and reduce retrieval accuracy. (2) Very large chunks may exceed model context limits or increase latency. (3) Overlap increases context continuity but also increases storage and cost. (4) Semantic and LLM-based chunking introduce additional processing cost. (docs.aws.amazon.com) (5) Hierarchical chunking may return fewer results if child chunks are replaced by parent chunks during retrieval. (docs.aws.amazon.com) (6) Multimodal data may use different chunking logic at the embedding level. (docs.aws.amazon.com)

Appendix A: Minimal Strategy Examples¶

Fixed-Size Chunking¶

from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

chunks = splitter.split_text(text)

Recursive Chunking¶

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

chunks = splitter.split_text(text)

Document-Based Chunking¶

documents = [
    {"content": section1, "metadata": {"section": "intro"}},
    {"content": section2, "metadata": {"section": "methods"}},
]

Semantic Chunking (example using embeddings)¶

from langchain_experimental.text_splitter import SemanticChunker
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
splitter = SemanticChunker(embeddings)

chunks = splitter.split_text(text)

Appendix B: Strategy Selection Example¶

Example rule-based selector:

def choose_chunking_strategy(doc_type):
    mapping = {
        "financial": "document_based",
        "presentation": "document_based",
        "legal": "hierarchical",
        "technical": "recursive",
        "research": "semantic",
        "chat": "fixed_size",
    }
    return mapping.get(doc_type, "recursive")

Reference¶

(1) https://docs.aws.amazon.com/bedrock/latest/userguide/kb-chunking.html (2) https://developer.nvidia.com/blog/finding-the-best-chunking-strategy-for-accurate-ai-responses/ (3) https://www.geeksforgeeks.org/data-science/chunking-strategies/ (4) https://community.databricks.com/t5/technical-blog/the-ultimate-guide-to-chunking-strategies-for-rag-applications/ba-p/113089