Partial Text Matching GAE

Question

I am developing a web application for managing customers. So I have a Customer entity which is made up by usual fields such as first_name, last_name, age etc.

I have a page where these customers are shown as a table. In the same page I have a search field, and I'd like to filter customers and update the table while the user is typing a something in the search field, using Ajax. Here is how it should work:

Figure 1: The main page showing all of the customers:

Figure 2: As long as the user types letter "b", the table is updated with the results:

Given that partial text matching is not supported in GAE, I have tricked and implemented it arising from what is shown here: TL;DR: I have created a Customers Index, that contains a Search Document for every customer(doc_id=customer_key). Each Search Document contains Atom Fields for every customer's field I want to be able to search on(eg: first_name, last_name): every field is made up like this: suppose the last_name is Berlusconi, the field is going to be made up by these Atom Fields "b" "be" "ber" "berl" "berlu" "berlus" "berlusc" "berlusco" "berluscon" "berlusconi". In this way I am able to perform full text matching in a way that resembles partial text matching. If I search for "Be", the Berlusconi customer is returned.

The search is made by Ajax calls: whenever a user types in the search field(the ajax is dalayed a little bit to see if the user keeps typing, to avoid sending a burst of requests), an Ajax call is made with the query string, and a json object is returned.

Now, things were working well in debugging, but I was testing it with a few people in the datastore. As long as I put many people, search looks very slow.

This is how I create search documents. This is called everytime a new customer is put to the datastore.

def put_search_document(cls, key):
    """
    Called by _post_put_hook in BaseModel
    """
    model = key.get()
    _fields = []
    if model:
        _fields.append(search.AtomField(name="empty", value=""),)  # to retrieve customers when no query string
        _fields.append(search.TextField(name="sort1", value=model.last_name.lower()))
        _fields.append(search.TextField(name="sort2", value=model.first_name.lower()))

        _fields.append(search.TextField(name="full_name", value=Customer.tokenize1(
            model.first_name.lower()+" "+model.last_name.lower()
            )),)

        _fields.append(search.TextField(name="full_name_rev", value=Customer.tokenize1(
            model.last_name.lower()+" "+model.first_name.lower()
            )),)

        # _fields.append(search.TextField(name="telephone", value=Customer.tokenize1(
        #     model.telephone.lower()
        #     )),)
        # _fields.append(search.TextField(name="email", value=Customer.tokenize1(
        #     model.email.lower()
        #     )),)

        document = search.Document(  # create new document with doc_id=key.urlsafe()
            doc_id=key.urlsafe(),
            fields=_fields)
        index = search.Index(name=cls._get_kind()+"Index")  # not in try-except: defer will catch and retry.
        index.put(document)

@staticmethod
def tokenize1(string):
    s = ""
    for i in range(len(string)):
        if i > 0:
            s = s + " " + string[0:i+1]
        else:
            s = string[0:i+1]
    return s

This is the search code:

@staticmethod
def search(ndb_model, query_phrase):
    # TODO: search returns a limited number of results(20 by default)
    # (See Search Results at https://cloud.google.com/appengine/docs/python/search/#Python_Overview)
    sort1 = search.SortExpression(expression='sort1', direction=search.SortExpression.ASCENDING,
                                  default_value="")
    sort2 = search.SortExpression(expression='sort2', direction=search.SortExpression.ASCENDING,
                                  default_value="")
    sort_opt = search.SortOptions(expressions=[sort1, sort2])
    results = search.Index(name=ndb_model._get_kind() + "Index").search(
        search.Query(
            query_string=query_phrase,
            options=search.QueryOptions(
                sort_options=sort_opt
            )
        )
    )

    print "----------------"
    res_list = []
    for r in results:
        obj = ndb.Key(urlsafe=r.doc_id).get()
        print obj.first_name + " "+obj.last_name
        res_list.append(obj)
    return res_list

Did anyone else had my same experience? If so, how have you solved it?

Thank you guys very much, Marco Galassi

EDIT: names, email, phone are obviously totally invented. Edit2: I have now moved to TextField, who look a little bit faster, but the problem still persist

hard to tell without code. i use search api very similarly to search people in a directory (with partial matches like yours) and its fast. however I search on pressing a button and not so often like you — Zig Mandel, Jul 08 '15 at 13:06
I do it the same way. You can improve the performance of fetching objects from datastore based on your search results by batch operation [ndb.get_multi](https://cloud.google.com/appengine/docs/python/ndb/functions): ndb.get_multi(ndb.Key(urlsafe=r.doc_id) for r in results) — Tomasz Żyźniewski, Jul 09 '15 at 07:51
tokenize snippet: return u' '.join(name[:i] for i in xrange(len(name) + 1)) — Tomasz Żyźniewski, Jul 09 '15 at 07:54
The ndb.get_multi made it work a little bit better. Could you please explain what is your second comment, "tokenize snippet"? — smellyarmpits, Jul 09 '15 at 14:15
That includes a whitespace at the beginning that I don't want :\ — smellyarmpits, Jul 10 '15 at 13:27

Partial Text Matching GAE

0 Answers0