Designing a Federated Enterprise Search Platform for 150+ Websites

Lessons from Drupal, Lucidworks Fusion, Apache Solr, and Applied AI Retrieval

Architectural decisions, trade-offs, implementation strategies, and lessons learned from building federated enterprise search across Drupal, Lucidworks Fusion, Apache Solr, and applied AI retrieval.

Federated enterprise search architecture for 150+ websites
Architecture overview: a portable indexing contract connects Drupal governance, normalized JSON exports, Fusion pipelines, Solr retrieval, and the search experience.

Executive Summary

The organization operated a large digital ecosystem composed of hundreds of independent websites.

Different business units.

Different countries.

Different editorial teams.

Different governance policies.

Different search expectations.

Despite those differences, users expected a single consistent search experience.

The initial objective was to build the first phase of a federated enterprise search platform capable of supporting more than 150 Drupal websites while preparing the foundation for future integration with additional enterprise systems.

The architecture intentionally separated content management from search execution.

Drupal remained the system of record.

Lucidworks Fusion became the search platform.

A custom integration layer connected both worlds without tightly coupling either one.

This separation proved fundamental throughout the project.


Why Enterprise Search Is Hard

Many software engineers think search is primarily a technical problem.

It is not.

Search is fundamentally an information architecture problem.

Every organization stores information differently.

Content evolves independently across business units.

Metadata becomes inconsistent.

Different countries impose different governance requirements.

Editorial teams organize information according to local needs.

Search must somehow unify all of those realities without forcing every team to work the same way.

The larger the organization becomes, the less the challenge resembles software development and the more it resembles organizational design.

Search stops being about retrieving documents.

It becomes about governing information.

That distinction influenced every architectural decision made throughout this project.


Understanding the Challenge

From the beginning, several architectural constraints were already clear.

The solution needed to:

  • Support an ecosystem of more than 150 Drupal websites.
  • Allow every website to configure its own searchable content.
  • Support multilingual indexing.
  • Support country-specific governance.
  • Enable federated search across selected websites.
  • Allow independent search experiences for different business units.
  • Support future integration with additional enterprise platforms.
  • Avoid unnecessary vendor lock-in.
  • Remain maintainable by Drupal development teams.

Those requirements immediately eliminated several naive solutions.

Simply connecting Drupal directly to a search vendor would have solved today’s requirements while making tomorrow’s requirements considerably more expensive.

Instead, flexibility became the primary architectural objective.


Architecture Before Technology

One of the biggest lessons I have learned throughout my career is that architecture should be driven by business capabilities rather than products.

Before writing code, several questions needed to be answered.

What content should become searchable?

Who owns that content?

How should metadata be normalized?

Which fields are shared across every website?

Which fields remain site-specific?

How should regional restrictions be respected?

How should future platforms participate in the same search experience?

Only after answering those questions did technology selection become meaningful.

This mindset significantly reduced implementation risk later in the project.


Designing for Growth Instead of Phase One

Although the initial production rollout focused on only a handful of websites, the architecture itself was intentionally designed for a much larger ecosystem.

The long-term vision extended beyond Drupal.

Future phases were expected to federate content originating from additional enterprise platforms, including Adobe Experience Manager (AEM) and other enterprise systems.

Because of that, the integration architecture deliberately avoided assumptions tied to any single CMS.

Instead, every content source would eventually participate through clearly defined indexing and retrieval contracts.

That architectural separation allowed the search platform to evolve independently from individual content management systems.


Why Drupal Search API Became the Foundation

One of the earliest architectural decisions was intentionally avoiding direct coupling between Drupal and Lucidworks.

Instead of generating Lucidworks-specific payloads directly from Drupal entities, Drupal Search API became the canonical indexing layer.

At first glance this decision appeared to introduce additional complexity.

In practice it dramatically simplified the architecture.

Using Search API provided several advantages.

Every website could independently configure searchable content.

Editorial teams could add or remove indexed fields without changing export logic.

Search processors could enrich content before export.

Additional processors could normalize multilingual metadata.

The export layer remained completely independent from Drupal’s entity internals.

Most importantly, the architecture remained portable.

If the organization ever decided to replace its search vendor, Drupal would continue producing normalized search documents.

Only the export adapter would need to change.

That single architectural decision reduced long-term platform risk considerably.

Architecture Decision Record

ADR-001 — Choosing Drupal Search API as the Abstraction Layer

Decision
Use Drupal Search API as the canonical indexing layer.
Status
Accepted
Phase
Phase 1
Impact
High

Context

The platform needed to support more than 150 Drupal websites, each with different content models, editorial requirements, languages, and searchable metadata. Future phases also needed to support additional enterprise systems beyond Drupal.

Decision

Use Drupal Search API as the abstraction layer responsible for selecting searchable entities, selecting fields, applying processors, enriching metadata, and normalizing content before export.

Alternatives Considered

Direct Drupal to Lucidworks Integration

Pros: Faster initial implementation.

Cons: Higher vendor lock-in, more coupling, harder testing, and more expensive future platform changes.

Search API to JSON Export to Lucidworks

Pros: CMS abstraction, easier testing, reusable architecture, and possible vendor replacement.

Cons: Additional abstraction layer and slightly more implementation effort.

Why This Decision Won

The extra implementation effort was outweighed by the long-term flexibility gained through a portable indexing architecture.

Long-Term Impact

Each Drupal site could independently configure searchable content, indexed fields, processors, and metadata without changing the Lucidworks integration.


Vendor-Aware Instead of Vendor-Dependent

Enterprise software inevitably depends on vendors.

Architecture should not.

Throughout the project I invested significant time understanding the underlying search stack rather than treating Lucidworks as a black box.

That learning path intentionally started from the lowest layer.

Apache Lucene.

Then Apache Solr.

Finally Lucidworks Fusion.

Understanding the layers underneath the commercial platform changed how architectural decisions were made.

Instead of implementing features based solely on product documentation, decisions could be evaluated according to the actual information retrieval principles behind the platform.

This also reduced dependence on vendor-specific consulting during technical discovery.

Internal knowledge became an architectural asset.

Architecture Decision Record

ADR-006 — Keeping the Integration Vendor-Aware, Not Vendor-Dependent

Decision
Export portable JSON documents instead of Lucidworks-specific payloads.
Status
Accepted
Impact
High

Context

The enterprise search platform depended on Lucidworks Fusion, but the organization needed to avoid making Drupal content modeling, indexing, and governance decisions impossible to reuse with another retrieval platform later.

Decision

Generate normalized JSON documents from Drupal and let the Lucidworks adapter handle platform-specific ingestion details.

Tradeoffs

This required a clearer export contract, but it prevented CMS logic from being shaped around one vendor's API surface.

Long-Term Impact

The architecture preserved a migration path toward future search, retrieval, or AI platforms without rebuilding content selection and enrichment from scratch.


A Flexible Indexing Architecture

The indexing architecture intentionally separated several responsibilities.

Multisite indexing and federation model with shared indexing pipeline and shared search collection
Multisite indexing model: many site-specific datasources, one shared indexing pipeline, a shared collection, and configurable query pipelines.

Drupal remained responsible for content.

Search API remained responsible for selecting and enriching searchable fields.

A custom export layer transformed normalized content into portable JSON documents.

Lucidworks Fusion remained responsible for ingestion, indexing, and retrieval.

This separation allowed each layer to evolve independently.

Editorial changes rarely required search platform changes.

Search platform improvements rarely required CMS modifications.

Each responsibility remained clearly isolated.

That isolation dramatically reduced long-term maintenance complexity.

Architecture Decision Record

ADR-003 — Shared Collection vs. Per-Site Collections

Decision
Use a shared collection model with metadata-driven separation.
Status
Accepted
Impact
High

Context

A platform designed for 150+ websites could have created separate collections for every website. That would have looked clean locally, but it would have multiplied operational overhead across environments, pipelines, facets, and relevance tuning.

Decision

Use shared collections and rely on normalized metadata such as datasource, site, language, country, and business context to filter and federate results.

Tradeoffs

The shared model required stronger metadata governance, but avoided collection sprawl and made cross-site federation much easier to operate.

Architecture Decision Record

ADR-004 — Shared Indexing Pipeline

Decision
Prefer shared indexing pipelines over one-off pipelines per website.
Status
Accepted
Impact
Medium to High

Context

Each site had different editorial needs, but the indexing lifecycle still needed consistent behavior for enrichment, normalization, and operational troubleshooting.

Decision

Centralize indexing behavior where possible and push site-specific variation into configuration and metadata instead of duplicating pipelines.

Why This Decision Won

Shared pipelines made the system easier to reason about, test, document, and support as the platform scaled.


Federation as a Configuration Problem

One particularly interesting challenge involved federation.

Users should not necessarily search every website.

Different experiences required different combinations of content.

Instead of hardcoding federation logic, Drupal administrators received a dedicated administration interface where they could configure:

  • participating datasources
  • query pipelines
  • autocomplete pipelines
  • labels
  • descriptions
  • visible facets
  • hidden facets
  • environment mappings
  • search behavior

Federation became configuration instead of software development.

That distinction significantly reduced operational overhead while allowing business teams to evolve search experiences without engineering involvement.

Architecture Decision Record

ADR-005 — Configurable Federation

Decision
Make federation configurable instead of hardcoding site relationships.
Status
Accepted
Impact
High

Context

Business units needed different combinations of searchable content. A single hardcoded relationship graph would have forced developers to update search behavior whenever business needs changed.

Decision

Expose federation through Drupal administration interfaces where teams could configure participating datasources, pipelines, labels, facets, and behavior.

Long-Term Impact

Search experiences could evolve through configuration, reducing engineering bottlenecks while keeping governance visible.


Relevance Engineering

Indexing documents is relatively easy.

Returning the right documents is considerably harder.

Search quality depends on relevance.

Throughout the project, search behavior was continuously refined using Lucidworks query pipelines together with Apache Solr concepts such as field weighting, boosting, query fields, phrase boosting, minimum match, filters, and language-specific metadata.

Rather than treating relevance as a one-time configuration exercise, it became an iterative engineering discipline.

Small improvements in ranking frequently produced larger improvements in perceived search quality than significant infrastructure changes.


Performance Through Simplicity

One architectural decision generated considerable discussion.

Should search become another JavaScript application?

Or should Drupal continue rendering the experience?

The final implementation used server-rendered Drupal pages enhanced with HTMX.

The result felt highly interactive while avoiding much of the complexity introduced by a fully decoupled frontend architecture.

Autocomplete.

Faceted search.

Incremental updates.

Partial page rendering.

All remained possible without abandoning Drupal’s strengths.

Sometimes the simplest architecture also becomes the most maintainable.

Search query flow from Drupal and HTMX through cached authentication to Fusion query pipeline
Search execution model: Drupal renders the experience, HTMX updates the page, cached tokens protect outbound requests, and Fusion handles retrieval.

Architecture Decision Record

ADR-002 — Server-Rendered Search with HTMX

Decision
Use HTMX-enhanced Drupal rendering instead of building a React search application.
Status
Accepted
Impact
Medium

Context

The search interface needed fast rendering, maintainability, autocomplete, facets, and partial updates. It did not require the complexity of a full single-page application.

Alternatives Considered

React

Pros: Rich ecosystem and familiar SPA patterns.

Cons: Additional frontend stack, hydration complexity, build tooling, and higher maintenance.

HTMX with Drupal/Twig

Pros: Progressive enhancement, server rendering, Drupal-native implementation, and minimal custom JavaScript.

Cons: Less suited for highly interactive SPA-style products.

Why This Decision Won

The search experience needed interactivity, not SPA complexity. HTMX preserved Drupal's rendering strengths while keeping the implementation easier to maintain.


Governance at Scale

Large organizations rarely struggle because of technology.

They struggle because of governance.

As the platform evolved, search administration expanded beyond indexing.

Administrators needed control over:

  • search environments
  • query profiles
  • collections
  • federation
  • country restrictions
  • editorial messaging
  • search facets
  • autocomplete behavior
  • metadata mappings

Providing those capabilities through Drupal administration interfaces significantly reduced operational friction while empowering non-developer stakeholders.

Architecture succeeded because governance became part of the product rather than an afterthought.


Although enterprise search represented the primary architectural challenge, the project naturally expanded into several adjacent domains.

OAuth integration.

Authentication.

Editorial workflows.

Code quality improvements.

Automated testing.

CI/CD enhancements.

Developer onboarding.

Technical documentation.

Architecture diagrams.

Proofs of concept.

Cross-team technical discovery.

Enterprise software rarely exists in isolation.

Successful architecture requires understanding how every subsystem influences the others.

Architecture Decision Record

ADR-007 — Cached Authentication Tokens

Decision
Cache JWT access tokens instead of authenticating every integration request.
Status
Accepted
Impact
Medium

Context

The integration layer needed secure communication with enterprise services without adding avoidable latency or unnecessary authentication load to every indexing or search-related operation.

Decision

Use cached JWT tokens with explicit expiration handling so authenticated requests could remain secure while avoiding repeated token negotiation.

Tradeoffs

The approach required careful token lifecycle handling, but improved reliability and performance for repeated integration calls.


Lessons Learned

Several lessons stand out after completing this work.

Technology choices matter less than architectural boundaries.

Metadata quality determines search quality.

Governance becomes increasingly important as ecosystems grow.

Abstraction layers provide long-term flexibility.

Documentation is part of architecture.

Search relevance is never “finished.”

Reverse engineering often teaches more than documentation.

Simple solutions frequently outperform fashionable architectures.

Perhaps the most important lesson is that enterprise search is not fundamentally about search engines.

It is about understanding information.


Enterprise Search and Applied AI

The recent popularity of Retrieval-Augmented Generation (RAG) has renewed interest in information retrieval.

Many discussions present RAG as something entirely new.

In reality, modern AI retrieval systems inherit decades of knowledge from enterprise search.

Before an LLM can generate useful answers, information must first be:

  • discovered
  • normalized
  • enriched
  • indexed
  • ranked
  • filtered
  • retrieved

Enterprise search has been solving those problems for many years.

Large language models simply introduced a new consumer of high-quality retrieval systems.

That realization fundamentally changed how I view AI.

Modern AI systems are only as good as the retrieval architecture beneath them.


Architect’s Notes

Looking back, I would summarize this project through four architectural principles.

Architecture should optimize for change rather than today’s implementation.

Vendor abstractions are investments, not overhead.

Governance scales organizations better than custom code.

Enterprise Search and Applied AI are converging around the same information retrieval foundations.

Those principles continue influencing how I approach software architecture today.


Architecture Decisions Summary

ADRDecisionStatus
ADR-001Search API abstractionAccepted
ADR-002Server-rendered search with HTMXAccepted
ADR-003Shared collection modelAccepted
ADR-004Shared indexing pipelineAccepted
ADR-005Configurable federationAccepted
ADR-006Vendor-aware integration architectureAccepted
ADR-007Cached JWT authentication tokensAccepted

These decisions are the architectural backbone of the platform. They document not only what was built, but why the system was shaped this way and what tradeoffs were accepted.


Final Thoughts

This project reinforced something I have observed throughout my career.

Search is rarely about search.

It is about understanding information.

It is about building systems that allow organizations to evolve without constantly rebuilding their technical foundations.

The technologies will continue changing.

Today’s search platform may become tomorrow’s AI retrieval layer.

The architectural principles, however, remain remarkably consistent.

Design for change.

Build clear boundaries.

Keep systems understandable.

Everything else becomes easier.


If you’re working on enterprise search, federated retrieval, applied AI, or large-scale Drupal architectures, I’d be happy to connect and exchange ideas. You can reach me through LinkedIn or explore more Engineering Case Studies at eduardotelaya.com.


© 2024. All rights reserved.