🇬🇧 Solr Master Cheat Sheet

Solr Master Cheat Sheet is a comprehensive guide to advanced Solr query capabilities, tailored for developers and search engineers working with Lucidworks Fusion or standalone Apache Solr. Starting from basic query structures, it explores expert-level concepts like edismax, field boosting, phrase matching (pf, pf2, pf3), tie-break scoring, user field restrictions (uf), and faceting strategies. With real-world examples and scoring breakdowns, this post is ideal for anyone looking to fine-tune search relevance and performance across enterprise datasets.
For the examples, the following were taken from these documents:
[
{
"id": "pr-001",
"title_t": "Lucidworks Launches New AI Platform",
"subtitle_t": "Empowering enterprises with AI search",
"date_dt": "2025-05-01T00:00:00Z",
"author_s": "Lucidworks",
"body_t": "Today, Lucidworks unveiled...",
"_version_": 1834020721566679040
},
{
"id": "pr-002",
"title_t": "Lucidworks Expands to Latin America",
"subtitle_t": "Opening new offices in LATAM",
"date_dt": "2025-04-15T00:00:00Z",
"author_s": "Press Office",
"body_t": "With this expansion...",
"_version_": 1834020729817923584
},
{
"id": "pr-003",
"title_t": "Search Trends in 2025",
"subtitle_t": "Insights from global leaders",
"date_dt": "2025-03-28T00:00:00Z",
"author_s": "Jane Doe",
"body_t": "Search is evolving quickly...",
"_version_": 1834020737099235328
},
{
"id": "pr-004",
"title_t": "AI-Powered Personalization in E-Commerce",
"subtitle_t": "How AI transforms online shopping experiences",
"date_dt": "2025-06-10T00:00:00Z",
"author_s": "Data Innovation Team",
"body_t": "E-commerce platforms are leveraging AI to offer
hyper-personalized shopping experiences.",
"_version_": 1834645263624437760
},
{
"id": "pr-005",
"title_t": "Lucidworks Integrates with Google Cloud",
"subtitle_t": "Bringing scalable search to enterprise cloud",
"date_dt": "2025-07-01T00:00:00Z",
"author_s": "Tech News",
"body_t": "Lucidworks expands its partnership with Google Cloud to offer
more scalable search solutions.",
"_version_": 1834645301092155392
},
{
"id": "pr-006",
"title_t": "Voice Search: The Next Frontier",
"subtitle_t": "Adapting search engines for voice queries",
"date_dt": "2025-07-15T00:00:00Z",
"author_s": "AI Research Group",
"body_t": "With the rise of smart assistants, voice search optimization
becomes critical for search engines.",
"_version_": 1834645312450330624
},
{
"id": "pr-007",
"title_t": "AI Ethics in Enterprise Search",
"subtitle_t": "Balancing innovation with responsibility",
"date_dt": "2025-08-01T00:00:00Z",
"author_s": "Lucidworks Research",
"body_t": "Organizations are implementing ethical frameworks to ensure
responsible use of AI in search.",
"_version_": 1834645344584990720
},
{
"id": "pr-008",
"title_t": "Federated Search: Connecting Disparate Systems",
"subtitle_t": "Unified access to multiple content sources",
"date_dt": "2025-08-15T00:00:00Z",
"author_s": "Search Engineering Team",
"body_t": "Federated search enables organizations to retrieve data across
multiple repositories seamlessly.",
"_version_": 1834645355487035392
}
]
1. Basic Query Structure
Parameter | Purpose | Example |
---|---|---|
q | Main query (search terms) | q=lucidworks |
q.op | Default operator between terms | q.op=OR or q.op=AND |
fq | Filter query (filters without affecting score) | fq=author_s:"Press Office" |
df | Default field (avoid writing field:term everywhere) | df=body_t |
2. Field Types Behavior
Field Type | Tokenized? | Behavior |
---|---|---|
_t (text) | ✅ Yes | Tokenized: splits into words, lowercase |
_s (string) | ❌ No | Exact match only |
_i , _dt , _l , etc. | ❌ No | Numeric/date types |
3. Tokenization Quick Rule
_t
→ “Lucidworks Launches AI” → becomes: [lucidworks, launches, ai]_s
→ “Press Office” → stays “Press Office”
✅ Always quote _s
fields if they contain spaces:
author_s:"Press Office"
4. Widcard Behavior
Pattern | Meaning |
---|---|
term* | Starts with term |
*term | Ends with term |
*term* | Contains term (very expensive) |
⚠ Leading wildcards (*term
) are slow. Use carefully.
✅ Why author_s:Press* AND author_s:*Office
works:
- Both evaluate full string
"Press Office"
:- Starts with
Press
- Ends with
Office
- Starts with
5. Sorting, Pagination & Display
Parameter | Purpose | Example |
---|---|---|
sort | Control ordering | sort=date_dt desc |
start | Pagination start | start=10 |
rows | Results per page | rows=10 |
fl | Fields to return | fl=id,title_t,author_s |
indent | Pretty JSON output | indent=true |
wt | Output format | wt=json |
6. Filters (fq) vs Main Query (q)
q | fq | |
---|---|---|
Affects score? | ✅ Yes | ❌ No |
Multiple allowed? | ❌ (single q ) | ✅ Multiple fq |
Cacheable? | ❌ No | ✅ Faster |
7. edismax Mode (The Pro Mode)
Enable flexible queries, field boosting, and better scoring.
defType=edismax
q=Lucidworks Press
qf=title_t^3 subtitle_t^2 body_t
bq=author_s:"Press Office"^5
bf=recip(ms(NOW,date_dt),3.16e-11,1,1)
Parameter | Purpose |
---|---|
qf | Query fields & boosting |
bq | Boost certain docs |
bf | Boost by function (recency, popularity) |
8. Faceting (For Filters / Categories / Aggregations)
Parameter | Purpose | Example |
---|---|---|
facet=true | Enable faceting | facet=true |
facet.field | Field to facet | facet.field=author_s |
facet.prefix | Filter facets by prefix | facet.prefix=n |
facet.contains | Filter facets containing string | facet.contains=News |
facet.sort | Facet sorting | facet.sort=count |
9. Highlighting
Highlight matching terms inside result fields.
hl=true
hl.fl=title_t,body_t
10. defType
= Default Query Parser
Tells Solr how to interpret what you write in q
Option | What is it for? |
---|---|
lucene | The strictest and most exact. It requires you to write field:value. Complete control, but verbose. |
dismax | More flexible for Google-type searches, allowing you to write loose text, but less control over boosts. |
edismax | It combines the best of both worlds: flexible, supports boosts (qf , bq , bf ), punctuation handling, tolerates, etc. |
11. q.alt (Alternative Query)
When q (your main query) is empty or not included in the request, q.alt defines a default value that the engine will use to generate results.
q=
q.alt=*:*
If the user does not write anything (the q is empty), then it will do a *:*
(bring all documents).
12. qf (Query Fields)
Tells Solr which fields to search when you use edismax
q=Lucidworks
defType=edismax
qf=title_t^4 body_t^2
q.op=OR
What does that mean?
- Search for the term Lucidworks in:
- title_t with weight 4.
- body_t with weight2.
- The OR allows the document to qualify even if it appears in only one of the fields.
📊 Let’s review the 3 documents:
id | title_t | body_t | Matches title? | Matches body? |
---|---|---|---|---|
pr-001 | "Lucidworks Launches New AI Platform" | "Today, Lucidworks unveiled..." | ✅ | ✅ |
pr-002 | "Lucidworks Expands to Latin America" | "With this expansion..." | ✅ | ❌ |
pr-005 | "Lucidworks Integrates with Google Cloud" | "Lucidworks expands its partnership..." | ✅ | ✅ |
- pr-001 and pr-005 match both
title_t
andbody_t
. - pr-002 only matches in
title_t
So why does pr-002 appear as second place? Let’s review the score of each document
Document pr-001
1.7172029 = max of:
1.7172029 (title_t)
1.6052904 (body_t)
This means:
- Lucene does scoring separately by field.
- Then it takes the
max()
because the disjunction is using max como as the score aggregator (by default in edismax).
Although body_t
has some score (because it also appears there), since title_t
has a higher score, it only keeps that 1.7172029
.
Document pr-002
1.7172029 = max of:
1.7172029 (title_t)
- It only appears in
title_t
, so there’s no scoring forbody_t
. - Score final: 1.7172029
Document pr-005
1.7172029 = max of:
1.7172029 (title_t)
0.9921291 (body_t)
Why does pr-002
appear second, even though pr-005
also has a match in body_t
?
Because Solr (by default) uses max()
as the scoring aggregator for edismax disjunction.
- Although
pr-005
has an extra match inbody_t
, that extra match does not add to the score iftitle_t
already has the highest score. - Solr (by default) does not sum the scores across fields; it takes the maximum score per field — unless you explicitly configure a different tie-breaker (
tie
parameter).
The parameter that controls this:
tie=0.0
tie
controls whether scores from multiple fields are summed in disjunction. Since you didn’t set it → the default is 0.0
→ that’s why Solr only takes the maximum score.
How you would see a difference:
If you did:
q=Lucidworks
defType=edismax
qf=title_t^4 body_t^2
tie=0.1
Then Solr would calculate:
score = max(score) + tie * sum(other_scores)
With this, body_t
would start contributing to the score even if its score is lower. As a result, ``pr-005 could move up in ranking.
Document | title_t | body_t | Max score | Sum of Others scores | Final score |
---|---|---|---|---|---|
pr-001 | 1.7172029 | 1.6052904 | 1.7172029 | 1.6052904 | 1.8777319 |
pr-005 | 1.7172029 | 0.9921291 | 1.7172029 | 0.9921291 | 1.8164158 |
pr-002 | 1.7172029 | - | 1.7172029 | - | 1.7172029 |
As you can see:
- Now
pr-005
moves abovepr-002
. - Because even though both have the same score in
title_t
, thebody_t
score inpr-005
helped increase its overall score thanks to the tie.
13. mm = Minimum Match
Of all the query terms, it requires that at least X number of words (or percentage) match in the document for it to be considered relevant.
Formats you can use in mm:
mm | Meaning |
---|---|
100% | All terms must match |
75% | At least 75% terms must match |
3 | At least 3 terms must match |
2<75% | Conditional: all if ≤2 terms, else 75% |
14. pf → Phrase Fields
It is used to prioritize documents where search terms appear together and in order (as an exact phrase) in certain fields.
✅ Basic example:
q=lucidworks AI
qf=title^2 body^1
pf=title^5
qf
: searches forlucidworks
andAI
intitle
(weight 2) andbody
(weight 1), regardless of order or position within the fields.pf
: gives an extra boost if the exact phrase"lucidworks AI"
appears in thetitle
field.
What is it for?
- It improves ranking precision.
- Example: if someone searches for
"improving search"
and a document has that exact phrase insubtitle
, that document will rank higher.
Example:
pf=query_t~3^20
pf
: Phrase Fields → activates boosting for phrase matches.query_t~3
: searches for approximate phrase matches withinquery_t
, allowing up to 3 words of distance (slop).^20
: applies a boost (multiplies relevance) by 20 if the phrase match is found.
What is “slop”? It is the number of positions the words in the phrase can be apart and still be considered a valid match.
For example, if the query is "search AI"
:
"search for AI"
→ ✅ (slop 1)"search really good AI"
→ ✅ (slop 3)"search something unrelated here AI"
→ ❌ (slop > 3)
15. tie (Tie Breaker)
What does tie
do?
The tie
parameter says:
“Don’t completely ignore the second score. Add a small portion of it.”
The formula with tie
is:
score = max(title_score, body_score) + tie × sum(of the other scores)
🎯 What is “sum of the others “?
“Others” refers to the scores from fields that were not the highest.
Only the fields that didn’t win (i.e., not the max).
All of those scores are summed together, and then multiplied by the tie
value.
16. bq = Boost Query
It is an additional query that increases the relevance (score) of certain documents
Example:
bq=author_s:"Press Office"^5
If the author_s field has exactly Press Office, add a 5x boost to the score for that match.
Does the formula get updated?
The total scoring works like this:
finalScore = score(q) + score(bq) + score(bf) + (other boosts)
score(q)
comes from your normal query (qf).score(bq)
is the Boost Query score (added if it matches).score(bf)
is the Boost Function score (added if applicable).
The engine sums all these parts to compute the final ranking.
Simplifying:
Assume:
q
produced 1.71 points.bq
(becauseauthor_s:Press Office
matched) adds 4.07.
Total score:
finalScore = 1.71 + 4.07 = 5.78
Exactl y as you saw in your explain
output 👍
17. bf = Boost Functions
- It’s used to apply additional mathematical functions to the score.
- It works with numerical values from documents (dates, sizes, popularity, etc.).
Example:
bf=recip(ms(NOW,date\_dt),3.16e-11,1,1)
What does this do?
ms(NOW,date_dt)
→ calculates the difference between the current time and thedate_dt
field in milliseconds.recip()
→ applies a reciprocal function (higher score for more recent dates).
This way:
- Documents with more recent dates receive a higher score.
- Older documents receive a lower score.
Example finished
q=lucidworks
qf=title_t^4 body_t^2
bf=recip(ms(NOW,date_dt),3.16e-11,1,1)
debugQuery=true
This will search for documents with “lucidworks”, but will give higher scores to newer documents.
18. uf (User Fields)
Which fields the user is authorized to search.
For example:
uf=title_t body_t subtitle_t
👉 Then, if the user submit s:
q=title_t:Lucidworks
✅ Allowed.
But if they submit:
q=author_s:Lucidworks
❌ Not allowed — will return a "field not allowed" error.
🧪 Full example:
q=Lucidworks
qf=title_t^4 body_t^2 subtitle_t^3
uf=title_t body_t subtitle_t
qf
controls the field weights.uf
controls which fields are accessible to the user.
Anything not listed in uf
is blocked from being used in the query.
19. pf2 → Phrase Fields for 2-term phrases
It is a parameter that specifies fields where boost is given if the query has phrases of 2 consecutive terms
Example
q=Lucidworks Integrates
qf=title_t^4 body_t^2
pf2=title_t^10 debugQuery=true
The query is processed like this:
- Word 1
Lucidworks
- Word 2
Integrates
Now pf2
comes into play:
pf2=title_t^10
👉 We’re telling Solr:
If it finds the exact phrase "Lucidworks Integrates"
(two consecutive words) in title_t
, apply a boost of x10.
In your case:
The document pr-005 has:
"title_t": "Lucidworks Integrates with Google Cloud"
Even though there are additional words, it does contain the exact phrase "Lucidworks Integrates"
right at the beginning.
✅ Ther efore, this document receiv es the pf2
boost.
20. pf3 → Phrase Fields for 3-term phrases
It is a parameter that specifies fields where boost is given if the query has phrases of 3 consecutive terms
Example
q=Lucidworks Integrates with
qf=title_t^4 body_t^2
pf3=title_t^1
5debugQuery=true
It will search for the exact phrase:
“Lucidworks Integrates with”
- Document
pr-005
contains this full phrase. - Therefore, it would receive another strong boost.
21. boost
The boost is simply a score multiplier.
Example
q=Lucidworks
defType=edismax
qf=title_t^4 body_t^2
boost=map(query({!field f=author_s v='Tech News'}), 0, 0, 1, 10)
This means:
- If author_s:Tech News matches ⇒ add 10.
- If not ⇒ add 0.
Now letʼs break it down:
1. map (…) → This is a conditional function in Solr
Syntax:
map(expression, default, default, trueValue, falseValue)
What it does:
- If
expression
is true (or greater than 0, it returnstrueValue
(in this case, 10. - If false (or 0, it returns
falseValue
(in this case, 0.
Thatʼs why here:
map(query(...), 0, 0, 1, 10)
Note: Usually the first 0 is ignored because map
can be used for continuous mappings, but for booleans this works as shown.
2. query(…) → Executes a logical subquery
It’s basically a mini-query inside the boost function.
- If the subquery returns something (i.e. finds a match),
query()
returns 1. - If it finds nothing, it returns
0
.
3. {!field f=author_s v=’Tech News’} → This is the Local Params Parser
The field
parser works like:
author\_s:"Tech News"
But inside the query()
function, the safe way is to use the local param syntax.
f
is the field.v
is the value.
21. facets = allows grouping**
Facets are result groupings. They do not affect ranking, they only help to:
- Display filters to the user.
- Summarize how many documents exist per category, author, date, etc.
1. Basic: facet=true + facet.field
Suppose we want to group by author_s
(the authors of your documents):
q=*:*
facet=true
facet.field=author_s
Response:
"facet_fields": {
"author_s": [
"AI Research Group", 1,
"Data Innovation Team", 1,
"Jane Doe", 1,
"Lucidworks", 1,
"Lucidworks Research", 1,
"Press Office", 1,
"Search Engineering Team", 1,
"Tech News", 1
]
}
2. Using facet.query → Manual count of a specific value
If we only want to know how many documents have author_s:"Tech News"
:
facet.query=author_s:"Tech News"
Response:
"facet_queries": {
"author_s:\"Tech News\"": 1
}
3. Using facet.prefix → Filter by prefix
Suppose we want to list authors starting with "L"
:
facet.field=author_s
facet.prefix=L
Response:
"facet_fields": {
"author_s": [
"Lucidworks", 1,
"Lucidworks Research", 1
]
}
4. Using facet.contains → Filter by substring
We search for authors containing "News"
:
facet.field=author_s
facet.contains=News
Response:
"facet_fields": {
"author_s": [
"Tech News", 1
]
}
5. Using facet.sort → Sorting facet values
By default, facets are sorted by count desc
(most frequent first).
If we want alphabetical sorting:
facet.field=author_s
facet.sort=index