Extract content from websites and convert to clean markdown for your AI applications. Perfect for LLM training, RAG systems, and content ingestion.
Get started with web scraping in just a few lines of code
import { SerpexClient } from 'serpex';
const client = new SerpexClient('your-api-key-here');
async function extractContent() {
try {
const result = await client.extract({
urls: [
'https://example.com/article1',
'https://example.com/article2'
]
});
console.log('Extraction successful:', result);
} catch (error) {
console.error('Extraction failed:', error);
}
}
extractContent();Complete API specification for the web scraping endpoint
urlsrequiredArray of URLs to extract content from (maximum 10 URLs)
Include your API key in the Authorization header:
Different response scenarios you might encounter
{
"success": true,
"results": [
{
"url": "https://example.com/article1",
"success": true,
"markdown": "# Article Title\n\nThis is the main content of the article converted to markdown format...\n\n## Section Header\n\nMore content here...",
"status_code": 200
},
{
"url": "https://example.com/article2",
"success": true,
"markdown": "# Second Article\n\nContent from the second webpage...\n\n### Subsection\n\nAdditional content...",
"status_code": 200
}
],
"metadata": {
"total_urls": 2,
"processed_urls": 2,
"successful_crawls": 2,
"failed_crawls": 0,
"credits_used": 6,
"response_time": 2150,
"timestamp": "2025-01-22T10:30:20.000Z"
}
}Detailed breakdown of all response fields
urlThe URL that was crawled
successWhether the crawl was successful
markdownClean markdown content (if successful)
status_codeHTTP status code of the response
crawled_atTimestamp when the URL was crawled
extraction_modeMethod used for content extraction
total_urlsTotal number of URLs requested
processed_urlsNumber of URLs processed
successful_crawlsNumber of successful extractions
failed_crawlsNumber of failed extractions
credits_usedCredits consumed (3 per URL)
response_timeTotal processing time in milliseconds
Common applications for the web scraping API
Prepare clean, structured content for training language models
Extract and process content for retrieval-augmented generation
Analyze and process web content for insights and research
Build comprehensive knowledge bases from web sources
Common errors and how to handle them
Ensure all URLs are valid and properly formatted. The API will return specific invalid URLs in the error response.
Respect rate limits and implement exponential backoff for retries.
Some URLs may fail while others succeed. Check the success field for each result individually.