2

My forum uses the Google App Engine Search Api. So naturally I would like to be able to find partial and misspelled words. But the api does not do that. Does anyone know work arounds or better alternatives to this api?

For the partial match I can imagine blowing up each word in a forum comment into a set of substrings. But that seems rather expensive. Just think: If a comment has 60 words (say 500 characters total) than saving that single forum post would add up to a huge text field in the Document

Document.Builder builder = Document.newBuilder(); 
builder.addField(Field.newBuilder().setName(“comment”).setText(comment))

This is rather ridiculous especially a regex matcher would be a lot more economical. Which begs the question: why is the query not using regex matching so that partial words can be found? As for misspelling there are a number of algorithms for doing this, why isn’t the App engine Seach api offering it?

Now of course I am posting this here hoping that I am wrong and someone will show me that the Search Api indeed provides all of these functionalities. I have been looking through many tutorials online and not much so far. So the question again: Does the App Engine Search Api allow for partial text matching and misspelled text matching? If not, how might I hack it?

Katedral Pillon
  • 14,534
  • 25
  • 99
  • 199

2 Answers2

2

The search API does not support partial text matching right now. There are however plenty of projects like this one. Providing the means to mount Lucene/Compass on top of GAE. Which are capable of doing exactly what you are looking for.

jirungaray
  • 1,674
  • 1
  • 11
  • 18
  • Before accepting this answer and telling everyone that it is **the answer**, I have been trying to get the code to compile with my project. Unfortunately I haven't had any luck. So I have created a new thread: http://stackoverflow.com/questions/29305483/how-do-i-include-lucene-for-app-engine-in-my-existing-project. I am still trying to figure it out and once I do I will mark this question as accepted. Thanks. – Katedral Pillon Mar 27 '15 at 16:26
1

You can use the stemming feature to query for word variations:

https://cloud.google.com/appengine/docs/java/search/query_strings#Java_Stemming

To search for common variations of a word, like plural forms and verb endings, use the ~ stem operator (the tilde character). This is a prefix operator which must precede a value with no intervening space. The value ~cat will match "cat" or "cats," and likewise ~dog matches "dog" or "dogs." The stemming algorithm is not fool-proof. The value ~care will match "care" and "caring," but not "cares" or "cared." Stemming is only used when searching text and HTML fields.

Nicholas Franceschina
  • 6,009
  • 6
  • 36
  • 51