Mongo Java Driver searching by text index

Question

In mongo's java driver I am using $text then $search to find things in a document however, this seems to not always yield the right results such as sometimes it will return items that don't have the searched for string. Is there a reason for this?

ANSWER

You can just escape the string and mongo won't use the delimiters. Example:

{"$text" : {"$search" : "\"<insert term here>\""}}

score 1 · Answer 1 · answered Dec 04 '17 at 22:13

1

As a quick preliminary note: You should generally give some more details and examples around your specific problem. However, this one can probably be answered without that. ;)

You are dealing with a text (!) index here. That means that there is quite some work done on the MongoDB server side when it creates the index; work that goes well beyond the sheer string.split() sort of thing that one would probably expect.

The most important things to understand is that the index will not necessarily hold all values that your text contains (e.g. common words like "and" or "the" may simply be omitted) and neither will it necessarily contain the exact words that you can read in your source data but rather word stems. A very good explanation on some of the stuff that's going on here can be found here: https://blog.codecentric.de/en/2013/01/text-search-mongodb-stemming/

There's more inside the index engine that deals with languages and special characters. But that's all reasonably well documented here and here.

Lastly, when you search for a string with spaces inside a field that's covered by a text index, the following part of the documentation is also relevant to understand:

$text will tokenize the search string using whitespace and most punctuation as delimiters, and perform a logical OR of all such tokens in the search string.

For example, you could use the following query to find all stores containing any terms from the list “coffee”, “shop”, and “java”:

db.stores.find( { $text: { $search: "java coffee shop" } } )

The bottom line is: There's quite a bunch of sources of potential confusion when it comes to text indexes so you want to make sure you've read up on them before you get going.

answered Dec 04 '17 at 22:13

dnickless

10,733
1
19
34

Sorry! Didn't really know how to word it. For when I created my index I used wildcard. Is it possible to prevent it from using punctuation and whitespace as delimiters? – C McCoy Dec 05 '17 at 11:54
I don't think so. What are you trying to achieve specifically? – dnickless Dec 05 '17 at 12:13
Searching through all fields for a text string. – C McCoy Dec 05 '17 at 12:38
Would you be able to give us some sample data and your index definition as well as a bunch of sample queries that you would want to support? – dnickless Dec 05 '17 at 13:46
Data looks like this: `{("_id": "Stuff"), "fieldOne": 300, "fieldTwo" : "Big/dumb.potatosald", "field3" : "rawSauce", "field4" : "noKetchup"} ` For making the index I did `db.collection.createIndex({"$**": "text"})` I would like to be able to run a query like `db.collection.find({$text: {$search: "Big/dumb.potatosald"}})` and only return documents with Big/dumb.potatosald in a field – C McCoy Dec 05 '17 at 15:36
Have you got nested structures ("sub-documents") or only flat fields on the top level that you need to cover with this approach? Also, which MongoDB version are you on? – dnickless Dec 05 '17 at 16:37
No nested structures, and version 3.2.9 – C McCoy Dec 05 '17 at 16:53
If you can upgrade to v3.4.4 and if you don't have an awful lot of data you could potentially resort to a solution that's based on $objectToArray: https://docs.mongodb.com/v3.4/reference/operator/aggregation/objectToArray/ – dnickless Dec 05 '17 at 16:58
No news here: https://stackoverflow.com/questions/6790819/searching-for-value-of-any-field-in-mongodb-without-explicitly-naming-it – dnickless Dec 05 '17 at 17:01
Sadly some individual collections are 5 million documents and growing by 1.5 million a month, I'm not sure $objectToArray would be a definitive long term solution. If only there was some way to not treat punctuation as a delimiter.... – C McCoy Dec 06 '17 at 11:26
To add some resolution to this I figured out you can just escape and mongo won't use the delimiters. I did this through the follow `{"$text" : {"$search" : "\"\""}}` Thanks for the help @dnickless! – C McCoy Dec 08 '17 at 12:51
Thanks for sharing that! – dnickless Dec 08 '17 at 13:14

Mongo Java Driver searching by text index

1 Answers1