🤖 Opensolr Web Crawler

Discover a seamless, AI-powered way to index, enrich, and search your web content—automatically.
Learn even more, here.

For setup details, assistance, or pricing information, contact us at:

📧 support@opensolr.com

What is the Opensolr Web Crawler?

The Opensolr Web Crawler is a robust platform for crawling, indexing, and enriching websites of any size. It automatically extracts key meta-information, applies Natural Language Processing (NLP) and Named Entity Recognition (NER), and injects all content and structure directly into your Solr index.

🚀 Instantly searchable: All content becomes instantly searchable via a fully responsive, embeddable search UI.
🤖 AI-driven enrichment: Named entities, sentiment, language detection, and more are extracted on the fly.
🕑 Get started in minutes: Launch a powerful, custom search engine on your data without manual setup.

💻 Test it out:

Full Documentation & Examples

🔎 See It In Action

Or try the Solr API for a live crawl.

⚡ Key Features

Full NLP and NER:
Extract people, locations, organizations, and more using OpenNLP.
Comprehensive Metadata Extraction:
Collects meta tags, page structure, creation dates, and document fields.
AI-Hints:
Opensolr AI-Hints are enabled by default for all crawler indexes, delivering rich context and smart search assistance.
Automatic Content Language Detection:
Indexes and searches in any language, with built-in stopword, synonym, and spellcheck support.
Responsive, Embeddable Search UI:
Integrate Opensolr search into your site, customize top bar, filters, and behavior.
Scheduled Recrawling & Live Stats:
Only new and updated content is fetched, with live stats for crawling and SEO.
Secure & Flexible:
Supports HTTP Auth for protected content, robust backup and replication, and fully managed by API or UI.
Rich Content Support:
Indexes and analyzes HTML, doc, docx, xls, PDF, and most image formats—extracting content, meta, GPS/location data, and sentiment.
Crawl Resume:
Pause and resume crawls anytime; supports cron jobs and incremental indexing.

⚙️ Embedding & Customization

You can embed your Opensolr Web Crawler Search Engine on any website.
Customize your search experience with parameters such as:

&topbar=off – Hide the top search tool
&q=SEARCH_QUERY – Set the initial search
&in=web/media/images – Filter by content type
&og=yes/no – Show/hide OG images per result
&source=WEBSITE – Restrict to a single domain
&fresh=... – Apply result freshness or sentiment bias
&lang=en – Filter by language

🚀 What's New

AI-Hints: Enabled by default for every crawler index.
Automatic Language Detection and advanced NER via OpenNLP.
Customizable for any language and analysis pipeline.
Full support for spellcheck, autocomplete, backup, and replication.
Live SEO & crawling stats and sentiment analysis.
Automated scheduling and easy management via UI or REST API.

📥 Solr Configuration for Crawling

To enable smooth crawling and full feature support, use our ready-made Solr configs:

Solr 9 Config Zip Archive

Do not manually modify your schema.xml for crawler indexes to ensure all features work as designed.