LangChain

LangChain 是一个用于开发由大语言模型驱动的应用程序的框架。

本文介绍如何将 VexDB 作为 LangChain 向量存储使用。通过本方案可实现企业级向量数据的持久化存储,支持复杂的元数据过滤,并直接利用 VexDB 的事务特性保障数据一致性。

前提条件

在进行 LangChain 的安装之前,请确保已经参考 安装 VexDB 的内容完成了数据库的安装,并部署了 Python3 环境(Python ≥ 3.9)。

安装 LangChain-VexDB 插件包

pip install langchain_vexdb-{version}-py3-none-any.whl

注意事项

  • 维度一致性:确保VECTOR(n)的维度与嵌入模型输出维度一致。
  • 选择合适的索引类型:如HNSW、IVFFlat、DiskANN,根据数据量和查询需求选择合适的索引类型。
    • HNSW 适合大规模数据。
    • IVFFlat 适合小规模数据。
    • DiskANN 适合大规模数据且需要高精度查询。

    索引参数需要根据实际情况调整。
  • 距离度量:根据场景选择合适的方法(<-> 为L2距离,<=> 为余弦距离)。

使用示例

  1. 安装依赖包。
    # 核心依赖
    pip install langchain-core psycopg2-binary sqlalchemy numpy
    # 可选嵌入模型(示例使用cohere)
    pip install langchain_cohere
    
  2. 初始化实例。
    from langchain_cohere import CohereEmbeddings
    from langchain_vexdb import VexdbVector
    from langchain_vexdb.vectorstores import VexdbVector
    from langchain_core.documents import Document
    
    # See docker command above to launch a postgres instance with pgvector enabled.
    connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain" 
    collection_name = "my_docs"
    embeddings = CohereEmbeddings()
    
    vectorstore = VexdbVector(
        embeddings=embeddings,
        collection_name=collection_name,
        connection=connection,
        use_jsonb=True,
    )
    
  3. 文档存储。
    docs = [
        Document(page_content='there are cats in the pond', metadata={"id": 1, "location": "pond", "topic": "animals"}),
        Document(page_content='ducks are also found in the pond', metadata={"id": 2, "location": "pond", "topic": "animals"}),
        Document(page_content='fresh apples are available at the market', metadata={"id": 3, "location": "market", "topic": "food"}),
        Document(page_content='the market also sells fresh oranges', metadata={"id": 4, "location": "market", "topic": "food"}),
        Document(page_content='the new art exhibit is fascinating', metadata={"id": 5, "location": "museum", "topic": "art"}),
        Document(page_content='a sculpture exhibit is also at the museum', metadata={"id": 6, "location": "museum", "topic": "art"}),
        Document(page_content='a new coffee shop opened on Main Street', metadata={"id": 7, "location": "Main Street", "topic": "food"}),
        Document(page_content='the book club meets at the library', metadata={"id": 8, "location": "library", "topic": "reading"}),
        Document(page_content='the library hosts a weekly story time for kids', metadata={"id": 9, "location": "library", "topic": "reading"}),
        Document(page_content='a cooking class for beginners is offered at the community center', metadata={"id": 10, "location": "community center", "topic": "classes"})
    ]
    
    # 存储文档
    vectorstore.add_documents(docs, ids=[doc.metadata['id'] for doc in docs])
    
  4. 进行向量检索。
    使用 similarity_search 方法进行向量检索。例如,查询与 "kitty" 相关的文档。
    vectorstore.similarity_search('kitty', k=10)
    
    # 查询结果输出
    [Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}),
    Document(page_content='the book club meets at the library', metadata={'id': 8, 'topic': 'reading', 'location': 'library'}),
    Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),
    Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),
    Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),
    Document(page_content='the market also sells fresh oranges', metadata={'id': 4, 'topic': 'food', 'location': 'market'}),
    Document(page_content='a cooking class for beginners is offered at the community center', metadata={'id': 10, 'topic': 'classes', 'location': 'community center'}),
    Document(page_content='fresh apples are available at the market', metadata={'id': 3, 'topic': 'food', 'location': 'market'}),
    Document(page_content='a sculpture exhibit is also at the museum', metadata={'id': 6, 'topic': 'art', 'location': 'museum'}),
    Document(page_content='a new coffee shop opened on Main Street', metadata={'id': 7, 'topic': 'food', 'location': 'Main Street'})]
    

过滤器支持

向量存储支持一组可应用于文档的元数据字段的过滤器。

过滤器 含义/类别
$eq Equality (==)
$ne Inequality (!=)
$lt Less than (<)
$lte Less than or equal (<=)
$gt Greater than (>)
$gte Greater than or equal (>=)
$in Special Cased (in)
$nin Special Cased (not in)
$between Special Cased (between)
$exists Special Cased (is null)
$like Text (like)
$ilike Text (case-insensitive like)
$and Logical (and)
$or Logical (or)
  1. 进行带过滤器的查询:
    vectorstore.similarity_search('kitty', k=10, filter={
        'id': {'$in': [1, 5, 2, 9]}
    })
    
    # 查询结果输出
    [Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}),
    Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),
    Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),
    Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'})]
    

    如果你提供一个包含多个字段但没有操作符的字典,顶层将被解释为逻辑 AND 过滤器。
    vectorstore.similarity_search('ducks', k=10, filter={
        'id': {'$in': [1, 5, 2, 9]},
        'location': {'$in': ["pond", "market"]}
    })
    
    # 查询结果输出
    [Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),
    Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'})]
    
    vectorstore.similarity_search('ducks', k=10, filter={
        '$and': [
            {'id': {'$in': [1, 5, 2, 9]}},
            {'location': {'$in': ["pond", "market"]}},
        ]
    })
    
    # 查询结果输出
    [Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),
    Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'})]
    
    vectorstore.similarity_search('bird', k=10, filter={
        'location': { "$ne": 'pond'}
    })
    
    # 查询结果输出
    [Document(page_content='the book club meets at the library', metadata={'id': 8, 'topic': 'reading', 'location': 'library'}),
    Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),
    Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),
    Document(page_content='a sculpture exhibit is also at the museum', metadata={'id': 6, 'topic': 'art', 'location': 'museum'}),
    Document(page_content='the market also sells fresh oranges', metadata={'id': 4, 'topic': 'food', 'location': 'market'}),
    Document(page_content='a cooking class for beginners is offered at the community center', metadata={'id': 10, 'topic': 'classes', 'location': 'community center'}),
    Document(page_content='a new coffee shop opened on Main Street', metadata={'id': 7, 'topic': 'food', 'location': 'Main Street'}),
    Document(page_content='fresh apples are available at the market', metadata={'id': 3, 'topic': 'food', 'location': 'market'})]
    

用于检索增强生成

有关如何使用此向量存储进行检索增强生成(RAG)的指南,请查阅以下内容:

查询示例: 使用高级筛选器

  • 演示如何使用 $nin 筛选器查找不在特定列表中的项目:
    results = vector_store.similarity_search(
        "kitty", k=10, filter={"id": {"$nin": [1, 5, 2]}})
        for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")
    
  • 演示如何使用 $between 筛选器查找在特定范围内的项目:
    results = vector_store.similarity_search(
        "apples", k=10, filter={"id": {"$between": [3, 5]}})
    for doc in results:
        print(f"* {doc.page_content} [{doc.metadata}]")
    
  • 演示如何使用 $ilike 筛选器进行不区分大小写的搜索:
    results = vector_store.similarity_search(
        "book", k=10, filter={"page_content": {"$ilike": "%book%"}}
    )
    for doc in results:
        print(f"* {doc.page_content} [{doc.metadata}]")
    
  • 演示如何使用 $and 筛选器组合多个条件:
    results = vector_store.similarity_search(
        "coffee", k=10, filter={"$and": [
            {"location": {"$eq": "Main Street"}},
            {"topic": {"$eq": "food"}}
        ]}
    )
    
    for doc in results:
        print(f"* {doc.page_content} [{doc.metadata}]")
    
  • 演示如何使用 $or 筛选器进行多条件之一的匹配:
    results = vector_store.similarity_search(
        "reading", k=10, filter={"$or": [
            {"location": {"$eq": "library"}},
            {"location": {"$eq": "community center"}}
        ]}
    )
    
    for doc in results:
        print(f"* {doc.page_content} [{doc.metadata}]")
    
  • 演示如何使用 $nin 筛选器查找不在特定列表中的项目:
    results = vector_store.similarity_search(
        "kitty", k=10, filter={"id": {"$nin": [1, 5, 2]}})
    
    for doc in results:
        print(f"* {doc.page_content} [{doc.metadata}]")
    
  • 演示如何使用 $between 筛选器查找在特定范围内的项目:
    results = vector_store.similarity_search(
        "apples", k=10, filter={"id": {"$between": [3, 5]}})
    
    for doc in results:
        print(f"* {doc.page_content} [{doc.metadata}]")
    
  • 演示如何使用 $ilike 筛选器进行不区分大小写的搜索:
    results = vector_store.similarity_search(
        "book", k=10, filter={"page_content": {"$ilike": "%book%"}}
    )
    
    for doc in results:
        print(f"* {doc.page_content} [{doc.metadata}]")
    
  • 演示如何使用 $and 筛选器组合多个条件:
    results = vector_store.similarity_search(
        "coffee", k=10, filter={"$and": [
            {"location": {"$eq": "Main Street"}},
            {"topic": {"$eq": "food"}}
        ]}
    )
    
    for doc in results:
        print(f"* {doc.page_content} [{doc.metadata}]")
    
  • 演示如何使用 $or 筛选器进行多条件之一的匹配:
    results = vector_store.similarity_search(
        "reading", k=10, filter={"$or": [
            {"location": {"$eq": "library"}},
            {"location": {"$eq": "community center"}}
        ]}
    )
    
    for doc in results:
        print(f"* {doc.page_content} [{doc.metadata}]")
    

需要帮助?

扫码添加企业微信
获得专业技术支持

企业微信二维码
🎯 快速响应💡 专业解答