Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langchain-zh.cn/llms.txt

Use this file to discover all available pages before exploring further.

Oracle AI 数据库支持 AI 工作负载,您可以在其中按含义(语义)而非仅关键词查询数据。它在一个系统中结合了对非结构化内容的语义搜索对业务数据的关系过滤——因此您可以构建检索工作流(如 RAG),而无需引入单独的向量数据库或将数据分散在多个平台上。 本指南演示如何使用 OracleVS(Oracle AI 向量搜索的 LangChain 向量存储集成)来:
  • 将文档和嵌入导入 Oracle
  • 运行相似性搜索
  • 创建 HNSW 和 IVF 索引
  • 应用元数据过滤器以进行高级检索
  • 在 Oracle Database 26ai 中启用混合搜索(关键词 + 语义)
  • 使用 Oracle Text 运行全文搜索

先决条件

安装 langchain-oracledbpython-oracledb 驱动程序将作为依赖项自动安装。
pip install -qU langchain-oracledb

连接到 Oracle 数据库

以下示例代码将展示如何连接到 Oracle 数据库。默认情况下,python-oracledb 以‘Thin’模式运行,直接连接到 Oracle 数据库。此模式不需要 Oracle 客户端库。但是,当 python-oracledb 使用它们时,有一些额外的功能可用。当使用 Oracle 客户端库时,python-oracledb 被称为处于‘Thick’模式。两种模式都具有全面的功能,支持 Python 数据库 API v2.0 规范。请参阅以下 指南,其中介绍了每种模式支持的功能。如果您无法使用 thin-mode,可能需要切换到 thick-mode。
import oracledb

# Please update with your username, password, hostname, port and service_name
username = "<username>"
password = "<password>"
dsn = "<hostname>:<port>/<service_name>"

connection = oracledb.connect(user=username, password=password, dsn=dsn)
print("Connection successful!")

导入所需的依赖项

from langchain_oracledb.vectorstores import oraclevs
from langchain_oracledb.vectorstores.oraclevs import OracleVS
from langchain_community.vectorstores.utils import DistanceStrategy
from langchain_core.documents import Document
from langchain_huggingface import HuggingFaceEmbeddings

加载文档

# Define a list of documents (The examples below are 5 random documents from Oracle Concepts Manual )

documents_json_list = [
    {
        "id": "cncpt_15.5.3.2.2_P4",
        "text": "If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.",
        "link": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/logical-storage-structures.html#GUID-5387D7B2-C0CA-4C1E-811B-C7EB9B636442",
    },
    {
        "id": "cncpt_15.5.5_P1",
        "text": "A tablespace can be online (accessible) or offline (not accessible) whenever the database is open.\nA tablespace is usually online so that its data is available to users. The SYSTEM tablespace and temporary tablespaces cannot be taken offline.",
        "link": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/logical-storage-structures.html#GUID-D02B2220-E6F5-40D9-AFB5-BC69BCEF6CD4",
    },
    {
        "id": "cncpt_22.3.4.3.1_P2",
        "text": "The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table.\nSometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.",
        "link": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/concepts-for-database-developers.html#GUID-3C50EAB8-FC39-4BB3-B680-4EACCE49E866",
    },
    {
        "id": "cncpt_22.3.4.3.1_P3",
        "text": "The LOB segment stores data in pieces called chunks. A chunk is a logically contiguous set of data blocks and is the smallest unit of allocation for a LOB. A row in the table stores a pointer called a LOB locator, which points to the LOB index. When the table is queried, the database uses the LOB index to quickly locate the LOB chunks.",
        "link": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/concepts-for-database-developers.html#GUID-3C50EAB8-FC39-4BB3-B680-4EACCE49E866",
    },
]
# Create LangChain Documents

documents_langchain = []

for doc in documents_json_list:
    metadata = {"id": doc["id"], "link": doc["link"]}
    doc_langchain = Document(page_content=doc["text"], metadata=metadata)
    documents_langchain.append(doc_langchain)

使用不同的距离度量创建向量存储

首先,我们将创建三个具有不同距离函数的向量存储。由于我们尚未在其中创建索引,它们目前只会创建表。稍后我们将使用这些向量存储来创建 HNSW 索引。要了解有关 Oracle AI 向量搜索支持的索引类型的更多信息,请参阅以下 指南 您可以手动连接到 Oracle 数据库,并将看到三个表: Documents_DOT, Documents_COSINEDocuments_EUCLIDEAN 然后我们将创建另外三个表 Documents_DOT_IVF, Documents_COSINE_IVFDocuments_EUCLIDEAN_IVF,用于 在这些表上创建 IVF 索引而不是 HNSW 索引。
# Ingest documents into Oracle Vector Store using different distance strategies

# When using our API calls, start by initializing your vector store with a subset of your documents
# through from_documents(), then incrementally add more documents using add_texts().
# This approach prevents system overload and ensures efficient document processing.

model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

vector_store_dot = OracleVS.from_documents(
    documents_langchain,
    model,
    client=connection,
    table_name="Documents_DOT",
    distance_strategy=DistanceStrategy.DOT_PRODUCT,
)
vector_store_max = OracleVS.from_documents(
    documents_langchain,
    model,
    client=connection,
    table_name="Documents_COSINE",
    distance_strategy=DistanceStrategy.COSINE,
)
vector_store_euclidean = OracleVS.from_documents(
    documents_langchain,
    model,
    client=connection,
    table_name="Documents_EUCLIDEAN",
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
)

# Ingest documents into Oracle Vector Store using different distance strategies
vector_store_dot_ivf = OracleVS.from_documents(
    documents_langchain,
    model,
    client=connection,
    table_name="Documents_DOT_IVF",
    distance_strategy=DistanceStrategy.DOT_PRODUCT,
)
vector_store_max_ivf = OracleVS.from_documents(
    documents_langchain,
    model,
    client=connection,
    table_name="Documents_COSINE_IVF",
    distance_strategy=DistanceStrategy.COSINE,
)
vector_store_euclidean_ivf = OracleVS.from_documents(
    documents_langchain,
    model,
    client=connection,
    table_name="Documents_EUCLIDEAN_IVF",
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
)

文本的添加和删除操作,以及基本相似性搜索

def manage_texts(vector_stores):
    """
    Adds texts to each vector store, demonstrates error handling for duplicate additions,
    and performs deletion of texts. Showcases similarity searches and index creation for each vector store.

    Args:
    - vector_stores (list): A list of OracleVS instances.
    """
    texts = ["Rohan", "Shailendra"]
    metadata = [
        {"id": "100", "link": "Document Example Test 1"},
        {"id": "101", "link": "Document Example Test 2"},
    ]

    for i, vs in enumerate(vector_stores, start=1):
        # Adding texts
        try:
            vs.add_texts(texts, metadata)
            print(f"\n\n\nAdd texts complete for vector store {i}\n\n\n")
        except Exception as ex:
            print(f"\n\n\nExpected error on duplicate add for vector store {i}\n\n\n")

        # Deleting texts using the value of 'id'
        vs.delete([metadata[0]["id"]])
        print(f"\n\n\nDelete texts complete for vector store {i}\n\n\n")

        # Similarity search
        results = vs.similarity_search("How are LOBS stored in Oracle Database", 2)
        print(f"\n\n\nSimilarity search results for vector store {i}: {results}\n\n\n")


vector_store_list = [
    vector_store_dot,
    vector_store_max,
    vector_store_euclidean,
    vector_store_dot_ivf,
    vector_store_max_ivf,
    vector_store_euclidean_ivf,
]
manage_texts(vector_store_list)

使用特定参数创建索引

def create_search_indices(connection):
    """
    Creates search indices for the vector stores, each with specific parameters tailored to their distance strategy.
    """
    # Index for DOT_PRODUCT strategy
    # Notice we are creating a HNSW index with default parameters
    # This will default to creating a HNSW index with 8 Parallel Workers and use the Default Accuracy used by Oracle AI Vector Search
    oraclevs.create_index(
        connection,
        vector_store_dot,
        params={"idx_name": "hnsw_idx1", "idx_type": "HNSW"},
    )

    # Index for COSINE strategy with specific parameters
    # Notice we are creating a HNSW index with parallel 16 and Target Accuracy Specification as 97 percent
    oraclevs.create_index(
        connection,
        vector_store_max,
        params={
            "idx_name": "hnsw_idx2",
            "idx_type": "HNSW",
            "accuracy": 97,
            "parallel": 16,
        },
    )

    # Index for EUCLIDEAN_DISTANCE strategy with specific parameters
    # Notice we are creating a HNSW index by specifying Power User Parameters which are neighbors = 64 and efConstruction = 100
    oraclevs.create_index(
        connection,
        vector_store_euclidean,
        params={
            "idx_name": "hnsw_idx3",
            "idx_type": "HNSW",
            "neighbors": 64,
            "efConstruction": 100,
        },
    )

    # Index for DOT_PRODUCT strategy with specific parameters
    # Notice we are creating an IVF index with default parameters
    # This will default to creating an IVF index with 8 Parallel Workers and use the Default Accuracy used by Oracle AI Vector Search
    oraclevs.create_index(
        connection,
        vector_store_dot_ivf,
        params={
            "idx_name": "ivf_idx1",
            "idx_type": "IVF",
        },
    )

    # Index for COSINE strategy with specific parameters
    # Notice we are creating an IVF index with parallel 32 and Target Accuracy Specification as 90 percent
    oraclevs.create_index(
        connection,
        vector_store_max_ivf,
        params={
            "idx_name": "ivf_idx2",
            "idx_type": "IVF",
            "accuracy": 90,
            "parallel": 32,
        },
    )

    # Index for EUCLIDEAN_DISTANCE strategy with specific parameters
    # Notice we are creating an IVF index by specifying Power User Parameters which is neighbor_part = 64
    oraclevs.create_index(
        connection,
        vector_store_euclidean_ivf,
        params={"idx_name": "ivf_idx3", "idx_type": "IVF", "neighbor_part": 64},
    )

    print("Index creation complete.")


create_search_indices(connection)

高级搜索

Oracle Database 23ai 支持预过滤、内过滤和后过滤,以增强 AI 向量搜索功能。这些过滤机制允许用户在执行向量相似性搜索之前、期间和之后应用约束,从而提高搜索性能和准确性。 关于 Oracle 23ai 中过滤的关键点:
  1. 预过滤 在执行向量相似性搜索之前,应用传统 SQL 过滤器以减少数据集。 通过限制 AI 算法处理的数据量来帮助提高效率。
  2. 内过滤 利用 AI 向量搜索直接在向量嵌入上执行相似性搜索,使用优化的索引和算法。 基于向量相似性高效地过滤结果,而无需扫描整个数据集。
  3. 后过滤 在向量相似性搜索后应用额外的 SQL 过滤以细化结果。 允许根据业务逻辑或附加元数据条件进一步细化。
为什么这很重要?
  • 性能优化:预过滤显著减少查询执行时间,使大规模数据集上的搜索更高效。
  • 准确性增强:内过滤确保向量搜索具有语义意义,提高搜索结果的质量。

过滤器详细信息

OracleVS 支持一组过滤器,可以使用 filter 参数应用于 metadata 字段。这些过滤器允许您根据各种标准选择和完善数据。 可用的过滤器运算符:
运算符描述
\$exists字段存在。
\$eq字段值等于操作数值(=)。
\$ne字段存在且值不等于操作数值(!=)。
\$gt字段值大于操作数值(>)。
\$lt字段值小于操作数值(<)。
\$gte字段值大于或等于操作数值(>=)。
\$lte字段值小于或等于操作数值(<=)。
\$between字段值在操作数数组中的两个值之间(或等于)。
\$startsWith字段值以操作数值开头。
\$hasSubstring字段值包含操作数作为子串。
\$instr字段值包含操作数作为子串。
\$regex字段值匹配给定的正则表达式模式。
\$like字段值匹配操作数模式(使用类 SQL 语法)。
\$in字段值等于操作数数组中的至少一个值。
\$nin字段存在,但其值不等于操作数数组中的任何值,或者字段不存在。
\$all字段值是包含操作数数组中所有项的数组,或者是匹配单个操作数的标量。
  • 您可以使用逻辑运算符组合这些过滤器:
逻辑运算符描述
\$and逻辑 AND
\$or逻辑 OR
\$nor逻辑 NOR
示例过滤器:
{
  "age": 65,
  "name": {"$regex": "*rk"},
  "$or": [
    {
      "$and": [
        {"name": "Jason"},
        {"drinks": {"$in": ["tea", "soda"]}}
      ]
    },
    {
      "$nor": [
        {"age": {"$lt": 65}},
        {"name": "Jason"}
      ]
    }
  ]
}
其他使用提示:
  • 当对象中的所有过滤器都必须满足时,可以省略 $and。这两个是等效的:
{ "$and": [
    { "name": { "$startsWith": "Fred" } },
    { "salary": { "$gt": 10000, "$lte": 20000 } }
]}
{
  "name": { "$startsWith": "Fred" },
  "salary": { "$gt": 10000, "$lte": 20000 }
}
  • $not 子句可以否定比较运算符:
{ "address.zip": { "$not": { "$eq": "90001" } } }
  • 使用 field: scalar 等同于 field: { "$eq": scalar }
{ "animal": "cat" }
有关更多过滤器示例,请参阅 测试规范
# Conduct advanced searches after creating the indices
def conduct_advanced_searches(vector_stores):
    query = "How are LOBS stored in Oracle Database"
    # Constructing a filter for direct comparison against document metadata
    # This filter aims to include documents whose metadata 'id' is exactly '2'
    db_filter = {
        "$and": [
            {"id": "101"},  # FilterCondition
            {
                "$or": [  # FilterGroup
                    {"status": "approved"},
                    {"link": "Document Example Test 2"},
                    {
                        "$and": [  # Nested FilterGroup
                            {"status": "approved"},
                            {"link": "Document Example Test 2"},
                        ]
                    },
                ]
            },
        ]
    }

    for i, vs in enumerate(vector_stores, start=1):
        print(f"\n--- Vector Store {i} Advanced Searches ---")
        # Similarity search without a filter
        print("\nSimilarity search results without filter:")
        print(vs.similarity_search(query, 2))

        # Similarity search with a filter
        print("\nSimilarity search results with filter:")
        print(vs.similarity_search(query, 2, filter=db_filter))

        # Similarity search with relevance score
        print("\nSimilarity search with relevance score:")
        print(vs.similarity_search_with_score(query, 2))

        # Similarity search with relevance score with filter
        print("\nSimilarity search with relevance score with filter:")
        print(vs.similarity_search_with_score(query, 2, filter=db_filter))

        # Max marginal relevance search
        print("\nMax marginal relevance search results:")
        print(vs.max_marginal_relevance_search(query, 2, fetch_k=20, lambda_mult=0.5))

        # Max marginal relevance search with filter
        print("\nMax marginal relevance search results with filter:")
        print(
            vs.max_marginal_relevance_search(
                query, 2, fetch_k=20, lambda_mult=0.5, filter=db_filter
            )
        )


conduct_advanced_searches(vector_store_list)

混合搜索

Oracle Database 26ai 支持混合搜索,将关键词(全文)和语义(向量)搜索结合到单个检索功能中。langchain-oracledb 集成公开了以下内容:
  • OracleVectorizerPreference:创建由混合索引使用的数据库端向量化器首选项。
  • create_hybrid_index / acreate_hybrid_index:创建 HYBRID VECTOR INDEX。
  • OracleHybridSearchRetriever:在 OracleVS 表上执行关键词、语义或混合检索。

先决条件和模型配置

使用混合搜索时,请使用 OracleEmbeddings 配置您的 OracleVS,以便向量化器首选项与嵌入配置完全匹配。您可以通过 OracleVectorizerPreference 提供额外参数来进一步调整混合向量索引。有关详细信息,请参阅 文档
from langchain_core.documents import Document
from langchain_oracledb.embeddings import OracleEmbeddings
from langchain_oracledb.vectorstores.oraclevs import OracleVS
from langchain_oracledb.retrievers.hybrid_search import (
    OracleVectorizerPreference,
    create_hybrid_index,
    OracleHybridSearchRetriever,
)

# Use OracleEmbeddings (database-resident model shown)
embeddings = OracleEmbeddings(conn=connection, params={"provider": "database", "model": "DB_MODEL"})

# Create/load your vector store
vs = OracleVS(connection, table_name="DOCS", embedding_function=embeddings)

# Create a vectorizer preference
pref = OracleVectorizerPreference.create_preference(
    vector_store=vs, preference_name="PREF_DOCS"
)

# Create a HYBRID VECTOR INDEX
create_hybrid_index(
    connection,
    idx_name="IDX_DOCS_HYB",
    vectorizer_preference=pref
)

# Build a retriever and search
retriever = OracleHybridSearchRetriever(
    vector_store=vs,
    idx_name="IDX_DOCS_HYB",
    search_mode="hybrid",     # "hybrid" | "keyword" | "semantic"
    k=5,
    return_scores=True,       # includes score, text_score, vector_score in metadata
)

docs = retriever.invoke("refund policy for premium plan")
for d in docs:
    print(d.page_content, d.metadata.get("score"), d.metadata.get("text_score"), d.metadata.get("vector_score"))

# Optional cleanup when done with the preference:
# pref.drop_preference()
替代方案 在不显式指定首选项的情况下创建索引:
  • 如果您不想管理命名首选项,请传递 vector_store 代替。该函数将创建临时首选项,构建索引,然后自动删除首选项。
create_hybrid_index(
    connection,
    idx_name="IDX_DOCS_HYB2",
    vector_store=vs,          # mutually exclusive with vectorizer_preference
    params={"parallel": 8},
)
注意事项和提示:
  • search_mode 决定使用哪些信号:
    • “keyword”:仅关键词
    • “semantic”:仅向量
    • “hybrid”(默认):两者结合
  • 通过检索器的 params 参数传递 DBMS_HYBRID_VECTOR 参数。
  • return_scores=True 将总分和组件 text_score 和 vector_score 添加到 Document.metadata。
  • 通过 acreate_hybrid_indexOracleHybridSearchRetriever.ainvoke 支持异步用法。
更多信息:

全文搜索 (Oracle Text)

您可以使用 Oracle Text 直接针对 Oracle 数据库运行高质量的关键词搜索。langchain-oracledb 集成公开了以下内容:
  • create_text_index / acreate_text_index:在列上创建 Oracle Text SEARCH INDEX
  • OracleTextSearchRetriever:运行 CONTAINS 查询并返回 LangChain Document 对象。
索引选项:
  • 如果您有 OracleVS 向量存储,您可以对其内置的”text”列进行索引。
  • 您也可以直接提供 table_name + column_name 来索引任何其他表/列。
from langchain_oracledb.retrievers.text_search import create_text_index, OracleTextSearchRetriever

# Using an OracleVS table (indexes the 'text' column)
retriever_text = None
create_text_index(
    connection,
    idx_name="IDX_DOCS_TEXT",
    vector_store=vs,
)

# Build a retriever. With OracleVS, returned_columns defaults to ["metadata"].
retriever_text = OracleTextSearchRetriever(
    vector_store=vs,
    k=5,
    fuzzy=True,          # applies Oracle Text FUZZY per token when operator_search=False
    return_scores=True,  # adds SCORE(1) as metadata["score"]
)
docs = retriever_text.invoke("refund policy")
for d in docs:
    print(d.page_content, d.metadata.get("score"))

# Alternatively, index an arbitrary table/column:
# create_text_index(connection, idx_name="IDX_MYDOCS_TEXT", table_name="MYDOCS", column_name="CONTENT")
# retriever_text = OracleTextSearchRetriever(client=connection, table_name="MYDOCS", column_name="CONTENT", k=5)
运算符模式和高级查询:
  • 默认行为 (operator_search=False):
    • 输入被视为纯文本,在非单词字符处进行分词,并重写为 ACCUM 表达式。
    • 使用 fuzzy=True 时,每个令牌都包装为 FUZZY("token") 以匹配拼写错误。
  • 运算符模式 (operator_search=True):
    • 原样传递 Oracle Text 表达式 (NEAR, ABOUT, AND, OR, NOT, WITHIN 等)。在此模式下,忽略模糊匹配。
返回的列:
  • 当目标是原始表时,通过 returned_columns 在结果中包含额外列;它们附加到 Document.metadata
  • 使用 OracleVS 时,returned_columns 默认为 [“metadata”]。
# Operator mode example
retriever_text_ops = OracleTextSearchRetriever(
    vector_store=vs,
    operator_search=True,   # pass Oracle Text expression directly
    return_scores=True,
)
docs = retriever_text_ops.invoke('NEAR((policy, refund), 2, TRUE)')
注意事项和提示:
  • 使用 operator_search=True 时,忽略模糊匹配(设计如此)。
  • 通过 acreate_text_indexOracleTextSearchRetriever.ainvoke 支持异步用法。
更多信息:

端到端演示

请参阅我们的完整演示指南 Oracle AI 向量搜索端到端演示指南,借助 Oracle AI 向量搜索构建端到端 RAG 管道。