Designing a Federated Enterprise Search Platform for 150+ Websites
Lessons from Drupal, Lucidworks Fusion, Apache Solr, and Applied AI Retrieval
Architectural decisions, trade-offs, implementation strategies, and lessons learned from building federated enterprise search across Drupal, Lucidworks Fusion, Apache Solr, and applied AI retrieval.
Executive Summary
The organization operated a large digital ecosystem composed of hundreds of independent websites.
Different business units.
Different countries.
Different editorial teams.
Different governance policies.
Different search expectations.
Despite those differences, users expected a single consistent search experience.
The initial objective was to build the first phase of a federated enterprise search platform capable of supporting more than 150 Drupal websites while preparing the foundation for future integration with additional enterprise systems.
The architecture intentionally separated content management from search execution.
Drupal remained the system of record.
Lucidworks Fusion became the search platform.
A custom integration layer connected both worlds without tightly coupling either one.
This separation proved fundamental throughout the project.
Why Enterprise Search Is Hard
Many software engineers think search is primarily a technical problem.
It is not.
Search is fundamentally an information architecture problem.
Every organization stores information differently.
Content evolves independently across business units.
Metadata becomes inconsistent.
Different countries impose different governance requirements.
Editorial teams organize information according to local needs.
Search must somehow unify all of those realities without forcing every team to work the same way.
The larger the organization becomes, the less the challenge resembles software development and the more it resembles organizational design.
Search stops being about retrieving documents.
It becomes about governing information.
That distinction influenced every architectural decision made throughout this project.
Understanding the Challenge
From the beginning, several architectural constraints were already clear.
The solution needed to:
- Support an ecosystem of more than 150 Drupal websites.
- Allow every website to configure its own searchable content.
- Support multilingual indexing.
- Support country-specific governance.
- Enable federated search across selected websites.
- Allow independent search experiences for different business units.
- Support future integration with additional enterprise platforms.
- Avoid unnecessary vendor lock-in.
- Remain maintainable by Drupal development teams.
Those requirements immediately eliminated several naive solutions.
Simply connecting Drupal directly to a search vendor would have solved today’s requirements while making tomorrow’s requirements considerably more expensive.
Instead, flexibility became the primary architectural objective.
Architecture Before Technology
One of the biggest lessons I have learned throughout my career is that architecture should be driven by business capabilities rather than products.
Before writing code, several questions needed to be answered.
What content should become searchable?
Who owns that content?
How should metadata be normalized?
Which fields are shared across every website?
Which fields remain site-specific?
How should regional restrictions be respected?
How should future platforms participate in the same search experience?
Only after answering those questions did technology selection become meaningful.
This mindset significantly reduced implementation risk later in the project.
Designing for Growth Instead of Phase One
Although the initial production rollout focused on only a handful of websites, the architecture itself was intentionally designed for a much larger ecosystem.
The long-term vision extended beyond Drupal.
Future phases were expected to federate content originating from additional enterprise platforms, including Adobe Experience Manager (AEM) and other enterprise systems.
Because of that, the integration architecture deliberately avoided assumptions tied to any single CMS.
Instead, every content source would eventually participate through clearly defined indexing and retrieval contracts.
That architectural separation allowed the search platform to evolve independently from individual content management systems.
Why Drupal Search API Became the Foundation
One of the earliest architectural decisions was intentionally avoiding direct coupling between Drupal and Lucidworks.
Instead of generating Lucidworks-specific payloads directly from Drupal entities, Drupal Search API became the canonical indexing layer.
At first glance this decision appeared to introduce additional complexity.
In practice it dramatically simplified the architecture.
Using Search API provided several advantages.
Every website could independently configure searchable content.
Editorial teams could add or remove indexed fields without changing export logic.
Search processors could enrich content before export.
Additional processors could normalize multilingual metadata.
The export layer remained completely independent from Drupal’s entity internals.
Most importantly, the architecture remained portable.
If the organization ever decided to replace its search vendor, Drupal would continue producing normalized search documents.
Only the export adapter would need to change.
That single architectural decision reduced long-term platform risk considerably. Architecture Decision Record The platform needed to support more than 150 Drupal websites, each with different content models, editorial requirements, languages, and searchable metadata. Future phases also needed to support additional enterprise systems beyond Drupal. Use Drupal Search API as the abstraction layer responsible for selecting searchable entities, selecting fields, applying processors, enriching metadata, and normalizing content before export. The extra implementation effort was outweighed by the long-term flexibility gained through a portable indexing architecture. Each Drupal site could independently configure searchable content, indexed fields, processors, and metadata without changing the Lucidworks integration.ADR-001 — Choosing Drupal Search API as the Abstraction Layer
Context
Decision
Alternatives Considered
Why This Decision Won
Long-Term Impact
Vendor-Aware Instead of Vendor-Dependent
Enterprise software inevitably depends on vendors.
Architecture should not.
Throughout the project I invested significant time understanding the underlying search stack rather than treating Lucidworks as a black box.
That learning path intentionally started from the lowest layer.
Apache Lucene.
Then Apache Solr.
Finally Lucidworks Fusion.
Understanding the layers underneath the commercial platform changed how architectural decisions were made.
Instead of implementing features based solely on product documentation, decisions could be evaluated according to the actual information retrieval principles behind the platform.
This also reduced dependence on vendor-specific consulting during technical discovery.
Internal knowledge became an architectural asset. Architecture Decision Record The enterprise search platform depended on Lucidworks Fusion, but the organization needed to avoid making Drupal content modeling, indexing, and governance decisions impossible to reuse with another retrieval platform later. Generate normalized JSON documents from Drupal and let the Lucidworks adapter handle platform-specific ingestion details. This required a clearer export contract, but it prevented CMS logic from being shaped around one vendor's API surface. The architecture preserved a migration path toward future search, retrieval, or AI platforms without rebuilding content selection and enrichment from scratch.ADR-006 — Keeping the Integration Vendor-Aware, Not Vendor-Dependent
Context
Decision
Tradeoffs
Long-Term Impact
A Flexible Indexing Architecture
The indexing architecture intentionally separated several responsibilities.
Drupal remained responsible for content.
Search API remained responsible for selecting and enriching searchable fields.
A custom export layer transformed normalized content into portable JSON documents.
Lucidworks Fusion remained responsible for ingestion, indexing, and retrieval.
This separation allowed each layer to evolve independently.
Editorial changes rarely required search platform changes.
Search platform improvements rarely required CMS modifications.
Each responsibility remained clearly isolated.
That isolation dramatically reduced long-term maintenance complexity. Architecture Decision Record A platform designed for 150+ websites could have created separate collections for every website. That would have looked clean locally, but it would have multiplied operational overhead across environments, pipelines, facets, and relevance tuning. Use shared collections and rely on normalized metadata such as datasource, site, language, country, and business context to filter and federate results. The shared model required stronger metadata governance, but avoided collection sprawl and made cross-site federation much easier to operate. Architecture Decision Record Each site had different editorial needs, but the indexing lifecycle still needed consistent behavior for enrichment, normalization, and operational troubleshooting. Centralize indexing behavior where possible and push site-specific variation into configuration and metadata instead of duplicating pipelines. Shared pipelines made the system easier to reason about, test, document, and support as the platform scaled.ADR-003 — Shared Collection vs. Per-Site Collections
Context
Decision
Tradeoffs
ADR-004 — Shared Indexing Pipeline
Context
Decision
Why This Decision Won
Federation as a Configuration Problem
One particularly interesting challenge involved federation.
Users should not necessarily search every website.
Different experiences required different combinations of content.
Instead of hardcoding federation logic, Drupal administrators received a dedicated administration interface where they could configure:
- participating datasources
- query pipelines
- autocomplete pipelines
- labels
- descriptions
- visible facets
- hidden facets
- environment mappings
- search behavior
Federation became configuration instead of software development.
That distinction significantly reduced operational overhead while allowing business teams to evolve search experiences without engineering involvement. Architecture Decision Record Business units needed different combinations of searchable content. A single hardcoded relationship graph would have forced developers to update search behavior whenever business needs changed. Expose federation through Drupal administration interfaces where teams could configure participating datasources, pipelines, labels, facets, and behavior. Search experiences could evolve through configuration, reducing engineering bottlenecks while keeping governance visible.ADR-005 — Configurable Federation
Context
Decision
Long-Term Impact
Relevance Engineering
Indexing documents is relatively easy.
Returning the right documents is considerably harder.
Search quality depends on relevance.
Throughout the project, search behavior was continuously refined using Lucidworks query pipelines together with Apache Solr concepts such as field weighting, boosting, query fields, phrase boosting, minimum match, filters, and language-specific metadata.
Rather than treating relevance as a one-time configuration exercise, it became an iterative engineering discipline.
Small improvements in ranking frequently produced larger improvements in perceived search quality than significant infrastructure changes.
Performance Through Simplicity
One architectural decision generated considerable discussion.
Should search become another JavaScript application?
Or should Drupal continue rendering the experience?
The final implementation used server-rendered Drupal pages enhanced with HTMX.
The result felt highly interactive while avoiding much of the complexity introduced by a fully decoupled frontend architecture.
Autocomplete.
Faceted search.
Incremental updates.
Partial page rendering.
All remained possible without abandoning Drupal’s strengths.
Sometimes the simplest architecture also becomes the most maintainable. Architecture Decision Record The search interface needed fast rendering, maintainability, autocomplete, facets, and partial updates. It did not require the complexity of a full single-page application. The search experience needed interactivity, not SPA complexity. HTMX preserved Drupal's rendering strengths while keeping the implementation easier to maintain.ADR-002 — Server-Rendered Search with HTMX
Context
Alternatives Considered
Why This Decision Won
Governance at Scale
Large organizations rarely struggle because of technology.
They struggle because of governance.
As the platform evolved, search administration expanded beyond indexing.
Administrators needed control over:
- search environments
- query profiles
- collections
- federation
- country restrictions
- editorial messaging
- search facets
- autocomplete behavior
- metadata mappings
Providing those capabilities through Drupal administration interfaces significantly reduced operational friction while empowering non-developer stakeholders.
Architecture succeeded because governance became part of the product rather than an afterthought.
Engineering Beyond Search
Although enterprise search represented the primary architectural challenge, the project naturally expanded into several adjacent domains.
OAuth integration.
Authentication.
Editorial workflows.
Code quality improvements.
Automated testing.
CI/CD enhancements.
Developer onboarding.
Technical documentation.
Architecture diagrams.
Proofs of concept.
Cross-team technical discovery.
Enterprise software rarely exists in isolation.
Successful architecture requires understanding how every subsystem influences the others. Architecture Decision Record The integration layer needed secure communication with enterprise services without adding avoidable latency or unnecessary authentication load to every indexing or search-related operation. Use cached JWT tokens with explicit expiration handling so authenticated requests could remain secure while avoiding repeated token negotiation. The approach required careful token lifecycle handling, but improved reliability and performance for repeated integration calls.ADR-007 — Cached Authentication Tokens
Context
Decision
Tradeoffs
Lessons Learned
Several lessons stand out after completing this work.
Technology choices matter less than architectural boundaries.
Metadata quality determines search quality.
Governance becomes increasingly important as ecosystems grow.
Abstraction layers provide long-term flexibility.
Documentation is part of architecture.
Search relevance is never “finished.”
Reverse engineering often teaches more than documentation.
Simple solutions frequently outperform fashionable architectures.
Perhaps the most important lesson is that enterprise search is not fundamentally about search engines.
It is about understanding information.
Enterprise Search and Applied AI
The recent popularity of Retrieval-Augmented Generation (RAG) has renewed interest in information retrieval.
Many discussions present RAG as something entirely new.
In reality, modern AI retrieval systems inherit decades of knowledge from enterprise search.
Before an LLM can generate useful answers, information must first be:
- discovered
- normalized
- enriched
- indexed
- ranked
- filtered
- retrieved
Enterprise search has been solving those problems for many years.
Large language models simply introduced a new consumer of high-quality retrieval systems.
That realization fundamentally changed how I view AI.
Modern AI systems are only as good as the retrieval architecture beneath them.
Architect’s Notes
Looking back, I would summarize this project through four architectural principles.
Architecture should optimize for change rather than today’s implementation.
Vendor abstractions are investments, not overhead.
Governance scales organizations better than custom code.
Enterprise Search and Applied AI are converging around the same information retrieval foundations.
Those principles continue influencing how I approach software architecture today.
Architecture Decisions Summary
| ADR | Decision | Status |
|---|---|---|
| ADR-001 | Search API abstraction | Accepted |
| ADR-002 | Server-rendered search with HTMX | Accepted |
| ADR-003 | Shared collection model | Accepted |
| ADR-004 | Shared indexing pipeline | Accepted |
| ADR-005 | Configurable federation | Accepted |
| ADR-006 | Vendor-aware integration architecture | Accepted |
| ADR-007 | Cached JWT authentication tokens | Accepted |
These decisions are the architectural backbone of the platform. They document not only what was built, but why the system was shaped this way and what tradeoffs were accepted.
Final Thoughts
This project reinforced something I have observed throughout my career.
Search is rarely about search.
It is about understanding information.
It is about building systems that allow organizations to evolve without constantly rebuilding their technical foundations.
The technologies will continue changing.
Today’s search platform may become tomorrow’s AI retrieval layer.
The architectural principles, however, remain remarkably consistent.
Design for change.
Build clear boundaries.
Keep systems understandable.
Everything else becomes easier.
If you’re working on enterprise search, federated retrieval, applied AI, or large-scale Drupal architectures, I’d be happy to connect and exchange ideas. You can reach me through LinkedIn or explore more Engineering Case Studies at eduardotelaya.com.