14

I read all answers to the same question and am not any clearer on which one I should use for my usecase and why. Both return the same result. I understand that "FilterQuery would be cached making the overall query time faster", like someone correctly answered.

I also understand that "filtering also allows tagging of facets, so you can tag facets to include all facets that are returned for your query", like someone else also correctly answered.

What I don't understand reading this, is why then anyone would use Q, since FQ seems to be so much better, based on all the answers and books I've seen.

Except, I'm sure there's probably a reason that both exist.

What I would like is to figure out what's best for my use case - the documentation is sorely lacking in useful examples.

  • My documents have: date, client, report, and some other fields
  • 1 business date = 3.5 million documents.
  • The total count of documents is 250 million and counting (60 dates * 8000 clients * 1000s of reports )
  • I facet on date, client, report, and I do use tagging of facets.
  • The UI overall looks like any e-commerce site, example: Amazon, with facets on the left.
  • Scoring is not used.

Business rule #1: date must always be present in every query.

Business rule #2: 99% of queries are going to use the LATEST date, but RANDOM client and random report.

A Fact: We determined that it’s faceting that is slow, not searching.

QUESTIONS:

Given this search criteria, and these ways to write a query:

A) q=date:20130214 AND client:Joe & facet.field=date & facet.field=client...

B) q=date:20130214 & fq=client:Joe & facet.field=date & facet.field=client...

C) q=client:Joe & fq= date:20130214 & facet.field=date & facet.field=client...

D) q=*:* & fq=date:20130214 & fq=client:Joe & facet.field=date & facet.field=client...

  • which of the above do you think would be best and why ? Remember, most queries are going to run against 20130214
  • in FQ filtering done first, and then Q condition applied, or the other way around?

Today, I have D) is used in all cases, but I suspect this is wrong and is causing OOMs in Solr(version 3.6).

Thank you for your help!

Dmitry z
  • 4,533
  • 4
  • 15
  • 15

2 Answers2

26

q query is the main query of the Request.
It is the one that would allow you to actually search over multiple fields.
q query would decide what score each of the documents has and hence would take part in the relevancy calculation.

q=*:* will just return all the documents with the same score.

fq is the filter query used to filter the documents and is not related to search.
So if you have any fixed value which you want to filter on you should use filters to limit your results.
fq does not affect the scoring of the results.
While filtering, Solr uses Filter cache to enhance the performance for the subsequent filter queries.

So ideally, you should check what the requirement demands. If you want to search, you should always use q, and if you want to filter/limit results you should use fq.

Facets are just an add-on to the results and do not affect your results.

Marko
  • 733
  • 8
  • 21
Jayendra
  • 52,349
  • 4
  • 80
  • 90
  • Thanks for posting all these helpful answers on SO!! – Rajat Gupta Oct 04 '13 at 06:05
  • Why a downvote? Please add a comment so I can improve my answer if its no longer valid !!!! – Jayendra Nov 27 '15 at 11:23
  • 1
    @Jayendra A fundamental query: It always says that fq is used to filter the documents and helps subsequent searches via cache. Suppose I have 100M docs spread uniformly across 100 categories. If I use a query (q) and fq=cat:5, then will solr in the first place search only docs with that cat:5 or search all 100M and then filter out cat:5? If the former is valid, then the search speed is also faster (not just subsequent searches hitting the filterCache). – Ethan May 30 '16 at 15:43
3

To answer your questions:

  • Based on your Business Rule, I would suggest that you put the date in the fq value since you are always limiting(filtering) results by a date value and it sounds like the date values could be reused by Solr. And the Q can contain the search for random client and report values as necessary.

  • When a user first comes to the UI, since you are only showing facets I would suggest you use q=<id field>:* where <id field> is your document id in the index and also set rows=0. Use the date restriction in the fq value again. Specifying rows=0 will produce a facet only query, reference Solr - Getting facet counts without returning results

Community
  • 1
  • 1
Paige Cook
  • 22,415
  • 3
  • 57
  • 68
  • 1
    thank you for your reply. If I say fq=20130214 & q=client:Paige, does it mean that Solr will have to first scan 250 million documents for client:Paige and then filter them to only leave the ones for date 20130214 ? If so, would it not be more efficient to first find everything for the date (3.5 million docs) and then filter based on client? And even bigger question: how can I tell, since there seem to be no tools that allow to try out both scenarios. – Dmitry z Feb 14 '13 at 05:26
  • 1
    Yes, the filter query will be performed after the main search has been executed. But based on your document numbers, you might want to switch them around and `q=20130214&fq=client:Paige` as you know you will only need to search for 3.5 million documents with the date and then filter those to only return the correct client. How to determine the correct approach is not hard and fast, it depends on the needs and scenario. As far as tools for testing queries, I would highly recommend SolrMeter - http://code.google.com/p/solrmeter/ – Paige Cook Feb 15 '13 at 12:54
  • Thanks Paige. You answered differently in two different replies (based on business rule and based on numbers), so I clarified my original question. Could you please look again. Also, SolrMeter does not show how Solr goes about processing the query- which caches it looks at, in what order, what it finds, etc. If you are familiar with Sybase(or any RDBMS), I’m looking for an equivalent of ‘set showplan on’ Thank you, -Dmitry. – Dmitry z Feb 22 '13 at 17:03