1

I perform a bing API search for webpages and the query cameras.

The first "page" of results (offset=0, count=50) returns 49 actual results. It also returns a totalEstimatedMatches of 114000000 -- 114 million. Neat, that's a lot of results.

The second "page" of results (offset=49, count=50) performs similarly...

...until I reach page 7 (offset=314, count=50). Suddenly totalEstimatedMatches is 544.

And the actual count of results returned per-page trails off precipitously from there. In fact, over 43 "pages" of results, I get 413 actual results, of which only 311 have unique URLs.

This appears to happen for any query after a small number of pages.

Is this expected behavior? There's no hint from the API documentation that exhaustive pagination should lead to this behavior... but there you have it.

Here's a screenshot:

strange behavior

Dave Peck
  • 1,342
  • 1
  • 17
  • 24

2 Answers2

1

Each time the API is called, the search API obtains a group of possible matches starting at in the result set, and then filters out the results based on different parameters (e.g spam, duplicates, safesearch setting, etc), finally leaving a final result set.  If the final result after filtering and optimization is more than the count parameter then the number of results equal to count would be returned. If the parameter is more than the final result set count then the final result set is returned which will be less than the count parameter.  If the search API is called again, passing in the offset parameter to get the next set of results, then the filtering process happens again on the next set of results which means it may also be less than count.

  You should not expect the full count parameter number of results to always be returned for each API call.  If further search results beyond the number returned are required then the query should be called again, passing in the offset parameter with a value equal to the number of results returned in the previous API call.  This also means that when making subsequent API calls, the offset parameter should never be a hard coded value and should always be calculated based on the results of previous queries. 

totalEstimatedMatches can also add to confusion around the Bing Search API results.  The word ‘estimated’ is important because the number is an estimation based on an initial quick result set, prior to the filtering described above.  Additionally, the totalEstimatedMatches value can change as you iterate through the result set by making subsequent API calls with increasing offset values.  The totalEstimatedMatches should only be used as a rough guide indicating the magnitude of the possible result set, and it should not be used to determine the number of results that will ultimately be returned.  To query all of the possible results you should continue making API calls, passing in offset with a value of the sum of the results returned in previous calls, until that sum is greater than totalEstimatedMatches of the most recent API call.

  Note that you can see this same behavior by going to bing.com directly and using a query such as https://www.bing.com/search?q=bill+gates&count=50.  Notice that you will get around 34 results with a totalEstimatedMatches of ~567,000 (valid as of June 2017, future searches may change), and if you click the 'next page' arrow you will see that the next query executed will start at the offset of the 34 returned in the first query (ie. https://www.bing.com/search?q=bill+gates&count=50&first=34).  If you click ‘next’ several more times you may see the totalEstimatedMatches also change from page to page.

Matt Small
  • 2,182
  • 1
  • 10
  • 16
  • Thanks for the quick response, Matt. You are describing the behavior I'm seeing. But please look carefully at my description: I **am** adding the retuned result count to `offset` each time. (See the screen shot for the actual `offset` values I'm using with each request.) At a certain point, returned results drops precipitously. The original estimate of 114 million seems unobtainable in practice -- even orders of magnitude less than that seems unobtainable in practice. **Does this match the actual expected behavior of the API, or is the API broken?** – Dave Peck Oct 13 '17 at 18:39
  • Yes, this is expected behavior. – Matt Small Oct 13 '17 at 18:43
  • Thanks @MattSmall -- just to extra clarify: **is it possible to use the API to exhaustively get 114M results for a single query**? (Or, anything resembling that scale of results for a single query?) – Dave Peck Oct 13 '17 at 18:45
  • Here's the important of the information above: totalEstimatedMatches can also add to confusion around the Bing Search API results.  The word ‘estimated’ is important because the number is an estimation based on an initial quick result set, prior to the filtering described above.  It is theoretically possible should there be that many actual results after filtering. In practice I don't know that it is or isn't. – Matt Small Oct 13 '17 at 18:48
  • Thanks Matt, but I feel that... side-steps my actual question about scale. I'm searching for the term "cameras". We both know there are more than 311 unique webpages on the web that pertain to cameras. An estimate of 114M sounds... plausible. Even if there are only 1M in practice... it seems the Bing API will only ever return 300-ish unique results (or thereabouts -- give or take an order of magnitude here) to me? – Dave Peck Oct 13 '17 at 18:50
  • What's your goal in understanding this? It might help me get you a better answer. – Matt Small Oct 13 '17 at 19:02
  • My goal is to understand the expected and actual behavior of the API. I would like to know whether it is possible to exhaustively obtain results at the scale of millions for a single query. Is it? – Dave Peck Oct 13 '17 at 19:04
  • I don't know. I can tell you how to get all of the results that are available. I'm not privy to the algorithm that calculates "totalestimatedmatches". – Matt Small Oct 13 '17 at 19:24
  • Okay, perhaps I'll ask a different way. When *you* use the Bing Search API to get all of the results available for the search query `cameras`, how many unique URLs do you ultimately retrieve? (I realize this number might vary from attempt to attempt, but I'm just curious about the typical number you get.) – Dave Peck Oct 13 '17 at 20:05
1

This seems to be expected behavior. The Web Search API is not a crawler API, thus it only delivers results, that the algorithms deem relevant for a human. Simply put, most humans won't skim through more than a few pages of results, furthermore they expect to find relevant results on the first page.

If you could retrieve the results in the millions, you could simply copy their search index and Bing would be out of business.

Search indices seem to be things of political and economic power, as far as I know there are only four relevant search indices world wide: from Google, from Microsoft (Bing), from Russia, and from China. Those who control the search, control the Spice... ;-)

tkr
  • 81
  • 4