🇬🇧 Solr Master Cheat Sheet

Solr Master Cheat Sheet

Solr Master Cheat Sheet is a comprehensive guide to advanced Solr query capabilities, tailored for developers and search engineers working with Lucidworks Fusion or standalone Apache Solr. Starting from basic query structures, it explores expert-level concepts like edismax, field boosting, phrase matching (pf, pf2, pf3), tie-break scoring, user field restrictions (uf), and faceting strategies. With real-world examples and scoring breakdowns, this post is ideal for anyone looking to fine-tune search relevance and performance across enterprise datasets.

For the examples, the following were taken from these documents:

[
  {
    "id": "pr-001",
    "title_t": "Lucidworks Launches New AI Platform",
    "subtitle_t": "Empowering enterprises with AI search",
    "date_dt": "2025-05-01T00:00:00Z",
    "author_s": "Lucidworks",
    "body_t": "Today, Lucidworks unveiled...",
    "_version_": 1834020721566679040
  },
  {
    "id": "pr-002",
    "title_t": "Lucidworks Expands to Latin America",
    "subtitle_t": "Opening new offices in LATAM",
    "date_dt": "2025-04-15T00:00:00Z",
    "author_s": "Press Office",
    "body_t": "With this expansion...",
    "_version_": 1834020729817923584
  },
  {
    "id": "pr-003",
    "title_t": "Search Trends in 2025",
    "subtitle_t": "Insights from global leaders",
    "date_dt": "2025-03-28T00:00:00Z",
    "author_s": "Jane Doe",
    "body_t": "Search is evolving quickly...",
    "_version_": 1834020737099235328
  },
  {
    "id": "pr-004",
    "title_t": "AI-Powered Personalization in E-Commerce",
    "subtitle_t": "How AI transforms online shopping experiences",
    "date_dt": "2025-06-10T00:00:00Z",
    "author_s": "Data Innovation Team",
    "body_t": "E-commerce platforms are leveraging AI to offer
    hyper-personalized shopping experiences.",
    "_version_": 1834645263624437760
  },
  {
    "id": "pr-005",
    "title_t": "Lucidworks Integrates with Google Cloud",
    "subtitle_t": "Bringing scalable search to enterprise cloud",
    "date_dt": "2025-07-01T00:00:00Z",
    "author_s": "Tech News",
    "body_t": "Lucidworks expands its partnership with Google Cloud to offer
    more scalable search solutions.",
    "_version_": 1834645301092155392
  },
  {
    "id": "pr-006",
    "title_t": "Voice Search: The Next Frontier",
    "subtitle_t": "Adapting search engines for voice queries",
    "date_dt": "2025-07-15T00:00:00Z",
    "author_s": "AI Research Group",
    "body_t": "With the rise of smart assistants, voice search optimization
    becomes critical for search engines.",
    "_version_": 1834645312450330624
  },
  {
    "id": "pr-007",
    "title_t": "AI Ethics in Enterprise Search",
    "subtitle_t": "Balancing innovation with responsibility",
    "date_dt": "2025-08-01T00:00:00Z",
    "author_s": "Lucidworks Research",
    "body_t": "Organizations are implementing ethical frameworks to ensure
    responsible use of AI in search.",
    "_version_": 1834645344584990720
  },
  {
    "id": "pr-008",
    "title_t": "Federated Search: Connecting Disparate Systems",
    "subtitle_t": "Unified access to multiple content sources",
    "date_dt": "2025-08-15T00:00:00Z",
    "author_s": "Search Engineering Team",
    "body_t": "Federated search enables organizations to retrieve data across
    multiple repositories seamlessly.",
    "_version_": 1834645355487035392
  }
]

1. Basic Query Structure

ParameterPurposeExample
qMain query (search terms)q=lucidworks
q.opDefault operator between termsq.op=OR or q.op=AND
fqFilter query (filters without affecting score)fq=author_s:"Press Office"
dfDefault field (avoid writing field:term everywhere)df=body_t

2. Field Types Behavior

Field TypeTokenized?Behavior
_t (text)✅ YesTokenized: splits into words, lowercase
_s (string)❌ NoExact match only
_i, _dt, _l, etc.❌ NoNumeric/date types

3. Tokenization Quick Rule

  • _t → “Lucidworks Launches AI” → becomes: [lucidworks, launches, ai]
  • _s → “Press Office” → stays “Press Office”

✅ Always quote _s fields if they contain spaces:

author_s:"Press Office"

4. Widcard Behavior

PatternMeaning
term*Starts with term
*termEnds with term
*term*Contains term (very expensive)

⚠ Leading wildcards (*term) are slow. Use carefully.

✅ Why author_s:Press* AND author_s:*Office works:

  • Both evaluate full string "Press Office":
    • Starts with Press
    • Ends with Office

5. Sorting, Pagination & Display

ParameterPurposeExample
sortControl orderingsort=date_dt desc
startPagination startstart=10
rowsResults per pagerows=10
flFields to returnfl=id,title_t,author_s
indentPretty JSON outputindent=true
wtOutput formatwt=json

6. Filters (fq) vs Main Query (q)

 qfq
Affects score?✅ Yes❌ No
Multiple allowed?❌ (single q)✅ Multiple fq
Cacheable?❌ No✅ Faster

7. edismax Mode (The Pro Mode)

Enable flexible queries, field boosting, and better scoring.

defType=edismax
q=Lucidworks Press
qf=title_t^3 subtitle_t^2 body_t
bq=author_s:"Press Office"^5
bf=recip(ms(NOW,date_dt),3.16e-11,1,1)
ParameterPurpose
qfQuery fields & boosting
bqBoost certain docs
bfBoost by function (recency, popularity)

8. Faceting (For Filters / Categories / Aggregations)

ParameterPurposeExample
facet=trueEnable facetingfacet=true
facet.fieldField to facetfacet.field=author_s
facet.prefixFilter facets by prefixfacet.prefix=n
facet.containsFilter facets containing stringfacet.contains=News
facet.sortFacet sortingfacet.sort=count

9. Highlighting

Highlight matching terms inside result fields.

hl=true
hl.fl=title_t,body_t

10. defType = Default Query Parser

Tells Solr how to interpret what you write in q

OptionWhat is it for?
luceneThe strictest and most exact. It requires you to write field:value. Complete control, but verbose.
dismaxMore flexible for Google-type searches, allowing you to write loose text, but less control over boosts.
edismaxIt combines the best of both worlds: flexible, supports boosts (qf, bq, bf), punctuation handling, tolerates, etc.

11. q.alt (Alternative Query)

When q (your main query) is empty or not included in the request, q.alt defines a default value that the engine will use to generate results.

q=
q.alt=*:*

If the user does not write anything (the q is empty), then it will do a *:* (bring all documents).

12. qf (Query Fields)

Tells Solr which fields to search when you use edismax

q=Lucidworks
defType=edismax
qf=title_t^4 body_t^2
q.op=OR

What does that mean?

  • Search for the term Lucidworks in:
    • title_t with weight 4.
    • body_t with weight2.
  • The OR allows the document to qualify even if it appears in only one of the fields.

📊 Let’s review the 3 documents:

idtitle_tbody_tMatches title?Matches body?
pr-001"Lucidworks Launches New AI Platform""Today, Lucidworks unveiled..."
pr-002"Lucidworks Expands to Latin America""With this expansion..."
pr-005"Lucidworks Integrates with Google Cloud""Lucidworks expands its partnership..."
  • pr-001 and pr-005 match both title_t and body_t.
  • pr-002 only matches in title_t

So why does pr-002 appear as second place? Let’s review the score of each document

Document pr-001

1.7172029 = max of:
 1.7172029 (title_t)
 1.6052904 (body_t)

This means:

  • Lucene does scoring separately by field.
  • Then it takes the max() because the disjunction is using max como as the score aggregator (by default in edismax).

Although body_t has some score (because it also appears there), since title_t has a higher score, it only keeps that 1.7172029.

Document pr-002

1.7172029 = max of:
 1.7172029 (title_t)
  • It only appears in title_t, so there’s no scoring for body_t.
  • Score final: 1.7172029

Document pr-005

1.7172029 = max of:
 1.7172029 (title_t)
 0.9921291 (body_t)

Why does pr-002 appear second, even though pr-005 also has a match in body_t?

Because Solr (by default) uses max() as the scoring aggregator for edismax disjunction.

  • Although pr-005 has an extra match in body_t, that extra match does not add to the score if title_t already has the highest score.
  • Solr (by default) does not sum the scores across fields; it takes the maximum score per field — unless you explicitly configure a different tie-breaker (tie parameter).

The parameter that controls this:

tie=0.0

tie controls whether scores from multiple fields are summed in disjunction. Since you didn’t set it → the default is 0.0 → that’s why Solr only takes the maximum score.

How you would see a difference:

If you did:

q=Lucidworks
defType=edismax
qf=title_t^4 body_t^2
tie=0.1

Then Solr would calculate:

score = max(score) + tie * sum(other_scores)

With this, body_t would start contributing to the score even if its score is lower. As a result, ``pr-005 could move up in ranking.

Documenttitle_tbody_tMax scoreSum of Others scoresFinal score
pr-0011.71720291.60529041.71720291.60529041.8777319
pr-0051.71720290.99212911.71720290.99212911.8164158
pr-0021.7172029-1.7172029-1.7172029

As you can see:

  • Now pr-005 moves above pr-002.
  • Because even though both have the same score in title_t, the body_t score in pr-005 helped increase its overall score thanks to the tie.

13. mm = Minimum Match

Of all the query terms, it requires that at least X number of words (or percentage) match in the document for it to be considered relevant.

Formats you can use in mm:

mmMeaning
100%All terms must match
75%At least 75% terms must match
3At least 3 terms must match
2<75%Conditional: all if ≤2 terms, else 75%

14. pf → Phrase Fields

It is used to prioritize documents where search terms appear together and in order (as an exact phrase) in certain fields.

✅ Basic example:

q=lucidworks AI
qf=title^2 body^1
pf=title^5
  • qf: searches for lucidworks and AI in title (weight 2) and body (weight 1), regardless of order or position within the fields.
  • pf: gives an extra boost if the exact phrase "lucidworks AI" appears in the title field.

What is it for?

  • It improves ranking precision.
  • Example: if someone searches for "improving search" and a document has that exact phrase in subtitle, that document will rank higher.

Example:

pf=query_t~3^20
  • pf: Phrase Fields → activates boosting for phrase matches.
  • query_t~3: searches for approximate phrase matches within query_t, allowing up to 3 words of distance (slop).
  • ^20: applies a boost (multiplies relevance) by 20 if the phrase match is found.

What is “slop”? It is the number of positions the words in the phrase can be apart and still be considered a valid match.

For example, if the query is "search AI":

  • "search for AI" → ✅ (slop 1)
  • "search really good AI" → ✅ (slop 3)
  • "search something unrelated here AI" → ❌ (slop > 3)

15. tie (Tie Breaker)

What does tie do?

The tie parameter says:

“Don’t completely ignore the second score. Add a small portion of it.”

The formula with tie is:

score = max(title_score, body_score) + tie × sum(of the other scores)

🎯 What is “sum of the others “?

“Others” refers to the scores from fields that were not the highest.

Only the fields that didn’t win (i.e., not the max).

All of those scores are summed together, and then multiplied by the tie value.

16. bq = Boost Query

It is an additional query that increases the relevance (score) of certain documents

Example:

bq=author_s:"Press Office"^5

If the author_s field has exactly Press Office, add a 5x boost to the score for that match.

Does the formula get updated?

The total scoring works like this:

finalScore = score(q) + score(bq) + score(bf) + (other boosts)
  • score(q) comes from your normal query (qf).
  • score(bq) is the Boost Query score (added if it matches).
  • score(bf) is the Boost Function score (added if applicable).

The engine sums all these parts to compute the final ranking.

Simplifying:

Assume:

  • q produced 1.71 points.
  • bq (because author_s:Press Office matched) adds 4.07.

Total score:

finalScore = 1.71 + 4.07 = 5.78

Exactl y as you saw in your explain output 👍

17. bf = Boost Functions

  • It’s used to apply additional mathematical functions to the score.
  • It works with numerical values from documents (dates, sizes, popularity, etc.).

Example:

bf=recip(ms(NOW,date\_dt),3.16e-11,1,1)

What does this do?

  • ms(NOW,date_dt) → calculates the difference between the current time and the date_dt field in milliseconds.
  • recip() → applies a reciprocal function (higher score for more recent dates).

This way:

  • Documents with more recent dates receive a higher score.
  • Older documents receive a lower score.

Example finished

q=lucidworks
qf=title_t^4 body_t^2
bf=recip(ms(NOW,date_dt),3.16e-11,1,1)
debugQuery=true

This will search for documents with “lucidworks”, but will give higher scores to newer documents.

18. uf (User Fields)

Which fields the user is authorized to search.

For example:

uf=title_t body_t subtitle_t

👉 Then, if the user submit s:

q=title_t:Lucidworks
 Allowed.

But if they submit:

q=author_s:Lucidworks
  Not  allowed    will  return  a  "field  not  allowed"  error.

🧪 Full example:

q=Lucidworks 
qf=title_t^4 body_t^2 subtitle_t^3
uf=title_t body_t subtitle_t
  • qf controls the field weights.
  • uf controls which fields are accessible to the user.

Anything not listed in uf is blocked from being used in the query.

19. pf2 → Phrase Fields for 2-term phrases

It is a parameter that specifies fields where boost is given if the query has phrases of 2 consecutive terms

Example

q=Lucidworks Integrates
qf=title_t^4 body_t^2
pf2=title_t^10 debugQuery=true

The query is processed like this:

  • Word 1 Lucidworks
  • Word 2 Integrates

Now pf2 comes into play:

pf2=title_t^10

👉 We’re telling Solr:

If it finds the exact phrase "Lucidworks Integrates" (two consecutive words) in title_t, apply a boost of x10.

In your case:

The document pr-005 has:

"title_t": "Lucidworks Integrates with Google Cloud"

Even though there are additional words, it does contain the exact phrase "Lucidworks Integrates" right at the beginning.

✅ Ther efore, this document receiv es the pf2 boost.

20. pf3 → Phrase Fields for 3-term phrases

It is a parameter that specifies fields where boost is given if the query has phrases of 3 consecutive terms

Example

q=Lucidworks Integrates with
qf=title_t^4 body_t^2
pf3=title_t^1
5debugQuery=true

It will search for the exact phrase:

“Lucidworks Integrates with”

  • Document pr-005 contains this full phrase.
  • Therefore, it would receive another strong boost.

21. boost

The boost is simply a score multiplier.

Example

q=Lucidworks 
defType=edismax
qf=title_t^4 body_t^2
boost=map(query({!field f=author_s v='Tech News'}), 0, 0, 1, 10)

This means:

  • If author_s:Tech News matches ⇒ add 10.
  • If not ⇒ add 0.

Now letʼs break it down:

1. map (…) → This is a conditional function in Solr

Syntax:

map(expression, default, default, trueValue, falseValue)

What it does:

  • If expression is true (or greater than 0, it returns trueValue (in this case, 10.
  • If false (or 0, it returns falseValue (in this case, 0.

Thatʼs why here:

map(query(...), 0, 0, 1, 10) 

Note: Usually the first 0 is ignored because map can be used for continuous mappings, but for booleans this works as shown.

2. query(…) → Executes a logical subquery

It’s basically a mini-query inside the boost function.

  • If the subquery returns something (i.e. finds a match), query() returns 1.
  • If it finds nothing, it returns 0.

3. {!field f=author_s v=’Tech News’} → This is the Local Params Parser

The field parser works like:

author\_s:"Tech News" 

But inside the query() function, the safe way is to use the local param syntax.

  • f is the field.
  • v is the value.

21. facets = allows grouping**

Facets are result groupings. They do not affect ranking, they only help to:

  • Display filters to the user.
  • Summarize how many documents exist per category, author, date, etc.

1. Basic: facet=true + facet.field

Suppose we want to group by author_s (the authors of your documents):

q=*:* 
facet=true
facet.field=author_s

Response:

"facet_fields": {
  "author_s": [
    "AI Research Group", 1,
    "Data Innovation Team", 1,
    "Jane Doe", 1,
    "Lucidworks", 1,
    "Lucidworks Research", 1,
    "Press Office", 1,
    "Search Engineering Team", 1,
    "Tech News", 1
  ]
}

2. Using facet.query → Manual count of a specific value

If we only want to know how many documents have author_s:"Tech News":

facet.query=author_s:"Tech News"

Response:

"facet_queries": {
  "author_s:\"Tech News\"": 1
}

3. Using facet.prefix → Filter by prefix

Suppose we want to list authors starting with "L":

facet.field=author_s
facet.prefix=L

Response:

"facet_fields": {
  "author_s": [
    "Lucidworks", 1,
    "Lucidworks Research", 1
  ]
}

4. Using facet.contains → Filter by substring

We search for authors containing "News":

facet.field=author_s
facet.contains=News

Response:

"facet_fields": {
  "author_s": [
    "Tech News", 1
  ]
}

5. Using facet.sort → Sorting facet values

By default, facets are sorted by count desc (most frequent first).

If we want alphabetical sorting:

facet.field=author_s
facet.sort=index

© 2024. All rights reserved.