Nimble 提取

Nimble 的提取 API 通过无头浏览器浏览特定 URL 来提取渲染内容。与发现内容的搜索 API 不同，提取工具处理已知 URL——非常适合需要获取和处理特定网页的智能体工作流，包括分页、过滤器和客户端渲染后的内容。

概述

集成详情

类	包	可序列化	JS 支持	包最新版本
`NimbleExtractTool`	`langchain-nimble`	❌	❌

工具功能

返回工件	原生异步	返回数据	定价
❌	✅	title, URL, content (markdown/plain_text/HTML), metadata	提供免费试用

关键功能：

URL 提取：并行从 1-20 个 URL 中提取渲染内容
动态渲染：处理 JavaScript、懒加载和客户端渲染
多种格式：plain_text（默认）、markdown 或 simplified_html
可配置的等待时间：控制页面加载行为以处理慢速加载的内容
浏览器驱动：根据不同的渲染需求选择 vx6、vx8 或 vx10 驱动
生产就绪：原生异步支持、自动重试、连接池

设置

该集成位于 langchain-nimble 包中。

pip install -U langchain-nimble

uv add langchain-nimble

凭据

您需要一个 Nimble API 密钥才能使用该工具。在 Nimble 注册以获取您的 API 密钥并访问其免费试用。

import getpass
import os

if not os.environ.get("NIMBLE_API_KEY"):
    os.environ["NIMBLE_API_KEY"] = getpass.getpass("Nimble API key:\n")

实例化

现在我们可以实例化工具：

from langchain_nimble import NimbleExtractTool

# Basic usage
tool = NimbleExtractTool()

在智能体中使用

我们可以将 Nimble 提取工具与智能体一起使用，赋予它 URL 内容提取能力。这是一个使用 LangGraph 的完整示例：

import os
import getpass

from langchain_nimble import NimbleExtractTool
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API key:\n")
if not os.environ.get("NIMBLE_API_KEY"):
    os.environ["NIMBLE_API_KEY"] = getpass.getpass("Nimble API key:\n")

# Initialize Nimble Extract Tool
extract_tool = NimbleExtractTool(
    parsing_type="markdown"
)

# Create agent with the tool
model = init_chat_model(model="gpt-4o", model_provider="openai", temperature=0)
agent = create_agent(model, [extract_tool])

# Ask the agent to extract and analyze content from LangChain documentation
user_input = "Extract and summarize the key concepts from these LangChain docs: https://python.langchain.com/docs/concepts/retrievers/, https://python.langchain.com/docs/concepts/tools/"

for step in agent.stream(
    {"messages": user_input},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

================================ Human Message =================================

Extract and summarize the key concepts from these LangChain docs: https://python.langchain.com/docs/concepts/retrievers/, https://python.langchain.com/docs/concepts/tools/

================================== Ai Message ==================================
Tool Calls:
  nimble_extract (call_abc123)
 Call ID: call_abc123
  Args:
    links: ['https://python.langchain.com/docs/concepts/retrievers/', 'https://python.langchain.com/docs/concepts/tools/']
    parsing_type: markdown

================================= Tool Message =================================
Name: nimble_extract

[{"title": "Retrievers | LangChain", "url": "https://python.langchain.com/docs/concepts/retrievers/", "content": "# Retrievers\n\nA retriever is an interface that returns documents given an unstructured query...\n\n## Key Concepts\n- Document retrieval from various sources\n- Integration with vector stores...", "metadata": {"extracted_at": "2025-12-10T..."}}, {"title": "Tools | LangChain", "url": "https://python.langchain.com/docs/concepts/tools/", "content": "# Tools\n\nTools are interfaces that agents can use to interact with the world...", "metadata": {...}}]

================================== Ai Message ==================================

Based on the extracted LangChain documentation, here are the key concepts:

**Retrievers:**
- Interface for returning documents based on unstructured queries
- Supports various data sources including vector stores
- Core component for RAG (Retrieval Augmented Generation) applications
- Enables semantic search over document collections

**Tools:**
- Interfaces enabling agents to interact with external systems
- Can be used for web search, API calls, calculations, and more
- Agents use tools to extend their capabilities beyond text generation
- Support both synchronous and asynchronous execution

高级配置

该工具支持广泛的 URL 提取配置：

参数	类型	默认值	描述
`links`	list[str]	None	要提取的 URL（1-20）- 由智能体在运行时提供
`parsing_type`	str	”plain_text”	输出格式：“plain_text”、“markdown” 或 “simplified_html”
`driver`	str	”vx6”	浏览器驱动版本：“vx6”（快速）、“vx8”（平衡）或 “vx10”（全面）
`wait`	int	None	等待页面加载的毫秒数（0-60000）
`render`	bool	True	启用 JavaScript 渲染
`locale`	str	”en”	页面区域设置偏好（例如：“en-US”）
`country`	str	”US”	本地化内容的国家代码（例如：“US”）
`api_key`	str	env var	Nimble API 密钥（默认为 NIMBLE_API_KEY 环境变量）

最佳实践

驱动选择

vx6（默认）：标准网站的快速提取
vx8：中等复杂网站的平衡性能
vx10：JavaScript 密集型单页应用（SPA）和复杂动态内容的全面渲染

何时使用等待时间

无需等待（wait=None）：最适合大多数具有快速初始渲染的现代网站
短等待（wait=1000-2000）：适用于有懒加载或动态内容的网站
更长等待（wait=5000+）：适用于加载缓慢的页面或需要时间完全渲染的复杂 SPA 应用

URL 管理

批量提取：每次调用提供 1-20 个 URL 以并行提取
错误处理：失败的 URL 将在智能体错误处理中报告
内容验证：智能体应在处理前验证提取的内容

性能优化

选择合适的格式：使用 plain_text 追求速度，markdown 追求结构，HTML 追求详细样式
调整等待时间：仅在必要时使用等待时间以平衡速度和可靠性
批量相关 URL：并行提取同一域中的多个 URL 以提高效率
使用异步：并发提取许多 URL 时调用 ainvoke()

API 参考

有关所有 NimbleExtractTool 功能和配置的详细文档，请访问 Nimble API 文档。

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

概述

集成详情

工具功能

设置

凭据

实例化

在智能体中使用

高级配置

最佳实践

驱动选择

何时使用等待时间

URL 管理

性能优化

API 参考

​概述

​集成详情

​工具功能

​设置

​凭据

​实例化

​在智能体中使用

​高级配置

​最佳实践

​驱动选择

​何时使用等待时间

​URL 管理

​性能优化

​API 参考

概述

集成详情

工具功能

设置

凭据

实例化

在智能体中使用

高级配置

最佳实践

驱动选择

何时使用等待时间

URL 管理

性能优化

API 参考