Google Cloud Talent Solution: how to use page_token

Question

I'm trying to use v4beta1 of GCTS - search_jobs()

The docs: https://cloud.google.com/talent-solution/job-search/docs/reference/rest/v4beta1/projects.jobs/search

There are references to the parameter pageToken but in \google\cloud\talent_v4beta1\gapic\job_service_client.py there is no such parameter in the function definition:

def search_jobs(
    self,
    parent,
    request_metadata,
    search_mode=None,
    job_query=None,
    enable_broadening=None,
    require_precise_result_size=None,
    histogram_queries=None,
    job_view=None,
    offset=None,
    page_size=None,
    order_by=None,
    diversification_level=None,
    custom_ranking_info=None,
    disable_keyword_match=None,
    retry=google.api_core.gapic_v1.method.DEFAULT,
    timeout=google.api_core.gapic_v1.method.DEFAULT,
    metadata=None,
):

In the comments page_token is mentioned - eg for the Offset parameter.

How do I specify the page token for job searches?

I've specified require_precise_result_size=False but the return value doesn't contain a SearchJobsResponse.estimated_total_size. Is this a clue that search_jobs() isn't being set to the desired "mode"?

Researcher · Answer 1 · 2019-07-09T17:09:54.423

0

I believe the pageToken is abstracted away for you by the python client library. If you go down to the end of the search_jobs method in the source you will see it builds an iterator that is aware of the pageToken and nextPageToken fields:

        iterator = google.api_core.page_iterator.GRPCIterator(
        client=None,
        method=functools.partial(
            self._inner_api_calls["search_jobs"],
            retry=retry,
            timeout=timeout,
            metadata=metadata,
        ),
        request=request,
        items_field="matching_jobs",
        request_token_field="page_token",
        response_token_field="next_page_token",
    )
    return iterator

So all you should need to do is the following - copied from the docs at https://googleapis.github.io/google-cloud-python/latest/talent/gapic/v4beta1/api.html:

from google.cloud import talent_v4beta1

client = talent_v4beta1.JobServiceClient()
parent = client.tenant_path('[PROJECT]', '[TENANT]')

# TODO: Initialize `request_metadata`:
request_metadata = {}

# Iterate over all results
for element in client.search_jobs(parent, request_metadata):
    # process element
    pass


# Alternatively:
# Iterate over results one page at a time
for page in client.search_jobs(parent, request_metadata).pages:
    for element in page:
        # process element
        pass

Default page size is 10 apparently, you can modify this with the pageSize parameter. Page iterator documentation can be found here:

Doco: https://googleapis.github.io/google-cloud-python/latest/core/page_iterator.html

Source: https://googleapis.github.io/google-cloud-python/latest/_modules/google/api_core/page_iterator.html#GRPCIterator

Probably the simplest way to deal with this is consume all results using

allResults = list(results_iterator)

If you have massive amounts of data and don't want to page through in one go I would do the following. The ".pages" is just returning a generator that you can work with as usual.

resultsIterator = client.search_jobs(parent, request_metadata)
pages = resultsIterator.pages
currentPageIter = next(pages)
#do work with page
currentItem = next(currentPageIter)

currentPageIter = next(pages)
# etc...

You would need to catch StopIteration error for when you run out of items or pages:

https://anandology.com/python-practice-book/iterators.html

This is why:

def _page_iter(self, increment):
    """Generator of pages of API responses.

    Args:
        increment (bool): Flag indicating if the total number of results
            should be incremented on each page. This is useful since a page
            iterator will want to increment by results per page while an
            items iterator will want to increment per item.

    Yields:
        Page: each page of items from the API.
    """
    page = self._next_page()
    while page is not None:
        self.page_number += 1
        if increment:
            self.num_results += page.num_items
        yield page
        page = self._next_page()

See how after the yield it calls _next_page? This will check for more pages and then perform another request for you if they exist.

def _next_page(self):
    """Get the next page in the iterator.

    Returns:
        Page: The next page in the iterator or :data:`None` if
            there are no pages left.
    """
    if not self._has_next_page():
        return None

    if self.next_page_token is not None:
        setattr(self._request, self._request_token_field, self.next_page_token)

    response = self._method(self._request)

    self.next_page_token = getattr(response, self._response_token_field)
    items = getattr(response, self._items_field)
    page = Page(self, items, self.item_to_value)

    return page

If you are wanting a sessionless option, you can use offset + page size and pass the current offset to the user on each ajax request:

offset (int) –

Optional. An integer that specifies the current offset (that is, starting result location, amongst the jobs deemed by the API as relevant) in search results. This field is only considered if page_token is unset.

For example, 0 means to return results starting from the first matching job, and 10 means to return from the 11th job. This can be used for pagination, (for example, pageSize = 10 and offset = 10 means to return from the second page).

edited Jul 09 '19 at 17:09

answered Jul 09 '19 at 12:50

Researcher

1,006
7
14

thanks. I saw this example code. Having retrieved the first, say 10 results, how do I request the next 10 when the user taps "next results" or "scrolls down the list to see more"? Do I not need a page token to pass to the second call to search_jobs()? – Carl Jul 09 '19 at 13:30
Updated the answer, hope it helps! The library should take care of all of this for you so you dont have to call search_jobs again and pass tokens around. If you were directly using the REST API you would need to pass the nextToken received in the response in as the pageToken to your next request. – Researcher Jul 09 '19 at 14:38
thank you for your input but I don't think this is the approach Google developers can intend. If a search matches 3000 jobs and I only need to show my user 10 results it will be inefficient using the approach you highlight in Google's docs. I need to retrieve 10 results and only retrieve more if the user scrolls down the list of results to see results 11..20. I've seen that resultsIterator.next_page_token is populated after I access .pages but where to stick that value in a future call? – Carl Jul 09 '19 at 15:27
Yes the 3rd block is very inefficient with large result sets. The 4th block of code only goes through page by page, item by item. The idea is you don't manually make a future call but simply store the iterator. The iterator holds your "session information" and when calling next() will actually do another request, handling all the token stuff for you. Trust me on that or go through the page iterator documentation - specifically the source code in the second link above. – Researcher Jul 09 '19 at 15:50
I can see how the Iterator has been life easier for Google developers. For API users it appears to not save any work :) My previous pattern was to pass next_page_token to my app/user and if they ever requested more results they've include next_page_token with their request. I'm calling Talent Solution from another Google product - App Engine. After my call, the App Engine instance could exit leaving no instance data live. part #1 – Carl Jul 09 '19 at 16:01
Do I need to storage this Iterator in DataStore - in some format - in order to restore it should the user make a further request? I can generate a unique identifier for the caller to use so I know which Iterator to retrieve and also run a cron job to delete old iterators from the Data Store. This is all sounding rather wrong. part #2 – Carl Jul 09 '19 at 16:02
Oh I see, you are wanting sessionless setup? You may need to store in datastore or go back to manual, let me think on this. – Researcher Jul 09 '19 at 16:03
I’ll contact Google Support and see if Talent is dropping App Engine support or if they just didn’t know about it. – Carl Jul 09 '19 at 16:05
I would suggest using offset field instead of token as it is supported by the python api. You can give the current offset to the user and they can pass back to you. What do you think? Going to McDonalds be back in a bit :) – Researcher Jul 09 '19 at 17:07
I think it’s worth trying and hoping the api is smart enough not to have to retrieve all the records pre-offset. The older api had this side-effect and Google did not recommend the approach. This is clearly a rewrite but who knows how deep the rewrite goes. – Carl Jul 09 '19 at 17:09
I would definitely NOT write an API that way, but maybe they did haha. At this point I would contact google and ask about offset field. It seems reasonable to assume that it works, otherwise they would delete it in a fresh api rewrite? More so given they have taken away your ability to manually handle token. Absolute worst case you will have to use requests and parse JSON. Yucko. – Researcher Jul 09 '19 at 17:11
I can page through results using offset. Alas, SearchJobsResponse isn't populated with total_size, estimated_total_size or any value for total results. I've set require_precise_result_size to True and False without change. I've signed up for Google Support and will hear back within 5 days that the Support has started. Perhaps v4beta1 is more v4alpha1 and I should use an production version? Support will tell. – Carl Jul 10 '19 at 14:56
That sucks, we work so hard! I think totalSize is just the count of results in the response. Hopefully estimatedTotalSize is what you are looking for and they can fix their API. Never dealt with google support before, what are response times like? – Researcher Jul 10 '19 at 16:13
Just had a look through code and it's definitely in the api, they have protobuffers setup for it. Are you just seeing 0 come back in the response? Out of curiosity have you tried manual calls to REST API and checked results? Would confirm problem is on their end rather then in Python API. – Researcher Jul 10 '19 at 16:19
I’ve tried running my code locally so I can set breakpoints and inspect the response in detail – Carl Jul 10 '19 at 16:24
I've gone back to using Google's v3 API for job searching. It's straightforward to install alongside v4beta1 – Carl Jul 11 '19 at 11:16
1

Glad you've found a solution at least. – Researcher Jul 11 '19 at 18:12

Google Cloud Talent Solution: how to use page_token

1 Answers1