🤖 Opensolr Web Crawler
Discover a seamless, AI-powered way to index, enrich, and search your web content—automatically.
Learn even more, here.
For setup details, assistance, or pricing information, contact us at:
What is the Opensolr Web Crawler?
The Opensolr Web Crawler is a robust platform for crawling, indexing, and enriching websites of any size. It automatically extracts key meta-information, applies Natural Language Processing (NLP) and Named Entity Recognition (NER), and injects all content and structure directly into your Solr index.
- 🚀 Instantly searchable: All content becomes instantly searchable via a fully responsive, embeddable search UI.
- 🤖 AI-driven enrichment: Named entities, sentiment, language detection, and more are extracted on the fly.
- đź•‘ Get started in minutes: Launch a powerful, custom search engine on your data without manual setup.
đź’» Test it out:
🔎 See It In Action
- VECTOR SEARCH - AI: NEWS (EN)
- VECTOR SEARCH - AI: OPENSOLR (EN)
- Demo: Stiri (RO)
- Demo: Fresh News (EN)
- Demo: Tech News (EN)
- Demo: ProLabs (EN)
- Demo: Australian News (EN)
- Demo: Times of India (EN)
- Demo: Italy News (IT)
- Demo: Germany News (DE)
- Demo: Nyheter (SV)
- Demo: Alternative News (EN)
- Demo: eCommerce (RO)
- Demo: Baptist News (EN)
- Demo: FCA (EN)
Or try the Solr API for a live crawl.
⚡ Key Features
-
Full NLP and NER:
Extract people, locations, organizations, and more using OpenNLP. -
Comprehensive Metadata Extraction:
Collects meta tags, page structure, creation dates, and document fields. -
AI-Hints:
Opensolr AI-Hints are enabled by default for all crawler indexes, delivering rich context and smart search assistance. -
Automatic Content Language Detection:
Indexes and searches in any language, with built-in stopword, synonym, and spellcheck support. -
Responsive, Embeddable Search UI:
Integrate Opensolr search into your site, customize top bar, filters, and behavior. -
Scheduled Recrawling & Live Stats:
Only new and updated content is fetched, with live stats for crawling and SEO. -
Secure & Flexible:
Supports HTTP Auth for protected content, robust backup and replication, and fully managed by API or UI. -
Rich Content Support:
Indexes and analyzes HTML, doc, docx, xls, PDF, and most image formats—extracting content, meta, GPS/location data, and sentiment. -
Crawl Resume:
Pause and resume crawls anytime; supports cron jobs and incremental indexing.
⚙️ Embedding & Customization
You can embed your Opensolr Web Crawler Search Engine on any website.
Customize your search experience with parameters such as:
&topbar=off– Hide the top search tool&q=SEARCH_QUERY– Set the initial search&in=web/media/images– Filter by content type&og=yes/no– Show/hide OG images per result&source=WEBSITE– Restrict to a single domain&fresh=...– Apply result freshness or sentiment bias&lang=en– Filter by language
🚀 What's New
- AI-Hints: Enabled by default for every crawler index.
- Automatic Language Detection and advanced NER via OpenNLP.
- Customizable for any language and analysis pipeline.
- Full support for spellcheck, autocomplete, backup, and replication.
- Live SEO & crawling stats and sentiment analysis.
- Automated scheduling and easy management via UI or REST API.
📥 Solr Configuration for Crawling
To enable smooth crawling and full feature support, use our ready-made Solr configs:
Do not manually modify your schema.xml for crawler indexes to ensure all features work as designed.

