Skip to content

Chunking Strategy

Overview

A chunking strategy is the method used to divide documents into smaller, retrievable text segments before generating embeddings in a Retrieval-Augmented Generation (RAG) pipeline. The primary goal of a chunking strategy is to balance context preservation, retrieval accuracy, and system performance when searching knowledge bases.

In RAG systems, large documents are first split into chunks, converted into vector embeddings, and stored in a vector index. During query time, the most relevant chunks are retrieved and passed to the language model as context. Proper chunking directly affects retrieval quality, response accuracy, latency, and cost. (docs.aws.amazon.com) Scope:

(1) Applies to text-based knowledge retrieval systems (2) Used in vector databases, semantic search, and RAG pipelines (3) Independent of specific embedding or LLM providers (4) Applicable across enterprise, technical, and content-driven datasets

Chunking Strategies

The following strategies are commonly used in RAG systems.

Strategy Description Complexity Best For
Fixed-Size Splits text by token or character count Low Small or simple documents
Recursive Repeatedly splits text while preserving structure Low–Medium Semi-structured text
Document-Based Splits at document or section boundaries Low Structured files with clear sections
Semantic Splits by meaning or topic boundaries Medium Technical or narrative content
LLM-Based Uses a language model to determine chunk boundaries High Complex documents
Agentic AI agent decides chunking strategy dynamically Very High Highly nuanced or regulatory text
Late Chunking Embeds full document, then derives chunks High Context-dependent tasks
Hierarchical Multi-level chunking (section → paragraph → sentence) Medium Large structured documents

Core Concepts and Components

Concept Description Responsibility
Chunk A segment of text stored in the vector index Retrieval unit
Chunk Size Maximum token or character length per chunk Controls context and latency
Overlap Shared tokens between adjacent chunks Preserves context continuity
Embedding Vector representation of chunk text Enables similarity search
Vector Index Storage for chunk embeddings Supports fast retrieval
Metadata Attributes attached to chunks Enables filtering and traceability

RAG Chunking Execution Flow

(1) Load the source document into the ingestion pipeline. (2) Apply the selected chunking strategy to split the document into chunks. (3) Generate embeddings for each chunk. (4) Store embeddings and metadata in the vector index. (5) Receive a user query. (6) Convert the query into an embedding. (7) Retrieve the most similar chunks from the vector index. (8) Send retrieved chunks as context to the language model. (9) Generate the final response using the retrieved context.

Strategy Selection Guidelines

Chunking strategy decision checklist

Context: Selecting a chunking strategy directly affects retrieval accuracy and system performance. Actions: (1) Identify document structure (structured, semi-structured, unstructured). (2) Estimate average document length and variability. (3) Determine whether semantic boundaries are important. (4) Evaluate latency and cost constraints. (5) Select the simplest strategy that meets retrieval quality requirements.

Decision Rules

(1) Use Fixed-Size when speed and simplicity are the priority. (2) Use Recursive when documents have paragraphs or headings. (3) Use Document-Based when files already have clear sections. (4) Use Semantic when topic boundaries matter. (5) Use Hierarchical for large, structured manuals or policies. (6) Use LLM-Based when chunk boundaries require contextual understanding. (7) Use Late Chunking when full-document context is critical. (8) Use Agentic only for highly complex or domain-specific documents.

Chunking Recommendations by Document Type

Document Type Characteristics Recommended Strategy Notes
Financial reports Dense numeric data, sections, tables Document-Based or Hierarchical Preserve section boundaries
Presentations (slides) Short, independent slides Document-Based Treat each slide as a chunk
Legal contracts Long, structured clauses Hierarchical or LLM-Based Maintain clause context
Technical documentation Headings, code blocks Recursive or Hierarchical Preserve logical structure
Research papers Topic-driven sections Semantic Split by topic shifts
Emails and chat logs Short, independent messages Fixed-Size or Document-Based Treat each message as a chunk
FAQs and support articles Short question-answer pairs Document-Based One chunk per Q&A
Product catalogs Repetitive, structured entries Document-Based One chunk per product
Policies and compliance docs Multi-section structured text Hierarchical Preserve section hierarchy
Meeting transcripts Long conversational flow Semantic or Recursive Preserve topic transitions

Operational recommendation

Context: Mixed document collections. Actions: (1) Classify documents by type during ingestion. (2) Apply a different chunking strategy per document category. (3) Store the chosen strategy as metadata for observability.

Limitations and Edge Cases

(1) Very small chunks may lose semantic context and reduce retrieval accuracy. (2) Very large chunks may exceed model context limits or increase latency. (3) Overlap increases context continuity but also increases storage and cost. (4) Semantic and LLM-based chunking introduce additional processing cost. (docs.aws.amazon.com) (5) Hierarchical chunking may return fewer results if child chunks are replaced by parent chunks during retrieval. (docs.aws.amazon.com) (6) Multimodal data may use different chunking logic at the embedding level. (docs.aws.amazon.com)

Appendix A: Minimal Strategy Examples

Fixed-Size Chunking

from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

chunks = splitter.split_text(text)

Recursive Chunking

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

chunks = splitter.split_text(text)

Document-Based Chunking

documents = [
    {"content": section1, "metadata": {"section": "intro"}},
    {"content": section2, "metadata": {"section": "methods"}},
]

Semantic Chunking (example using embeddings)

from langchain_experimental.text_splitter import SemanticChunker
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
splitter = SemanticChunker(embeddings)

chunks = splitter.split_text(text)

Appendix B: Strategy Selection Example

Example rule-based selector:

def choose_chunking_strategy(doc_type):
    mapping = {
        "financial": "document_based",
        "presentation": "document_based",
        "legal": "hierarchical",
        "technical": "recursive",
        "research": "semantic",
        "chat": "fixed_size",
    }
    return mapping.get(doc_type, "recursive")

Reference

(1) https://docs.aws.amazon.com/bedrock/latest/userguide/kb-chunking.html (2) https://developer.nvidia.com/blog/finding-the-best-chunking-strategy-for-accurate-ai-responses/ (3) https://www.geeksforgeeks.org/data-science/chunking-strategies/ (4) https://community.databricks.com/t5/technical-blog/the-ultimate-guide-to-chunking-strategies-for-rag-applications/ba-p/113089