16

I have a question about queries in Solr. When I perform a query with multiple search terms that are all logically linked by OR (e.g. q=content:(foo OR bar OR foobar)) than Solr returns a list of documents that all matches any of these terms. But what Solr does not return is which documents were hit by which term(s). So in the example above, what I want to know is which documents in my result list contains the term foo etc. Given this information I would be able to create a term-document matrix.

So my question is: how can I tell Solr to give me that missing piece of information? I'm sure it is somewhere, otherwise the search as a whole would not work. But what am I missing? Thanks for your help.

PS: As a workaround I'm performing a single Solr query for all the search terms. But as you can imagine it's a desaster in matters of performance as the number of search terms can exceed 50 :(

tbmsu
  • 352
  • 3
  • 13

3 Answers3

16

Kind of depends on your requirements, but as far as I know there is no specific support for this in Solr. You can however hack it together in a few other ways. Not sure what you can expect for performance for these, tho..

Use Highlightning

If you use highlighting you can parse the returned highlighted snippets for the start/end tags of the highlighted text. This will be the term that matched something in your query.

Use debugQuery Information

You can parse the information returned by a query with debugQuery=true to determine that a term was associated with a result by looking at termWeight (iirc). This might be a filtered version of your original term (if you have stemming etc. active for the field).

Use Field Collapsing

By using group.query you can build lists of documents that matches each term, instead of issuing several requests. You can also build queries that feature several of the terms OR-ed together if you need lists for "contains either". Might not be effective for a large amount of fields.

Parse the returned document yourself

Get the document, then extract the terms by yourself. Will require a bit of fuzzy matching, since you'll have to deal with text processing on the Solr side as well.

Use Function Queries

You can get metavalues for each document with each term from a FunctionQuery that looks up the number occurences of a term in that document. Will require quite a few function queries for a large number of terms, but might be fast.

.. neither option is perfect, but might work for the problem at hand.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • 2
    Thanks a lot for your quick reply and the interesting suggestions. I now use the Function Queries and it seems that performance is not an issue :) For those who are interested: I'm using the`exists` function and add a pseudo field for every search term like so: `fl=exists(query({!v='content:(foo)'})),exists(query({!v='content:(bar)'}))`. From the response I parse the search term with an Regex. – tbmsu Jul 31 '14 at 12:57
  • 1
    @tbmsu Would you mind posting that as an answer? I think it helps round out the post. Also note that you can alias pseudo fields to avoid the regex parsing, e.g. `fl=foo:exists(query({!v='content:(foo)'}))` – Paul Bellora Oct 01 '14 at 20:12
  • Is there any solution I could list all index of a field of a given document? – Shih-En Chou Jun 20 '15 at 02:26
  • @Shih-EnChou Comments are not the place to ask new questions - create a question for that. To see the raw tokens for a document, use the LukeRequestHandler to get it in Solr or the Luke tool to inspect the index files outside of Solr. – MatsLindh Jun 21 '15 at 10:46
10

My comment as an answer:

I use the Function Queries and it seems that performance is not an issue :) For those who are interested: I'm using theexists function and add a pseudo field for every search term like so: fl=exists(query({!v='content:(foo)'})),exists(query({!v='content:(bar)'})). From the response I parse the search term with an Regex.

As Paul stated above, you can alias pseudo fields to avoid the regex parsing, e.g. fl=foo:exists(query({!v='content:(foo)'}))

tbmsu
  • 352
  • 3
  • 13
  • This is useful to me, thanks. Can you please let me know how i add more than one condition to this local param? i tried `fl=foo:exists(query({!v='content:(foo) and content2:(foo2)'})). I tried some variations to it as well, but does not seem to work. Any idea? – Ganesh Oct 28 '14 at 16:50
  • Sorry posted bit too early guess, `and` has to be in capital to make this to work. Otherwise it treats as a string i guess – Ganesh Oct 28 '14 at 17:18
0

In my case solr6.6 the query fl=foo:exists(query({!v='content:(foo)'})) not seems to be working it always return 0 documents and I was having foo in my document so I need to change this query to ?q=*:*&fl=foo:exists(query({!v='content:(foo)'})) and I started work for me.

Root
  • 955
  • 1
  • 16
  • 39
  • 1
    The reason for that is that you _didn't include a query_. `fl` is not a query - it's just an instruction to Solr telling it what fields it should return. You'll have to include a query as you discovered, where `q=*:*` would return all documents in the collection. – MatsLindh Oct 05 '18 at 07:42