1

I'd like to have a list of just the current titles for all questions in one of the smaller (less than 10,000 questions) stackexchange site. I tried the interactive utility here: https://api.stackexchange.com/docs/questions and it both reports the result as a json at the bottom, and produces the requesting url at the top. For example:

https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&tagged=apples&site=cooking

returns this JSON in my browser:

{"items":[{"tags":["apples","crumble"],"owner":{ ...
...
...],"has_more":true,"quota_max":300,"quota_remaining":252}

What is quota? It was 10,000 on one search on one site, but suddenly it's only 300 here.

I won't be doing this very often, what I'd like is the quickest way to edit that (or similar of course) url so I can get a list of all of the titles on a small site. I don't understand how to use paging, and I don't need any of the other fields. I don't care if I get them, but I'm thinking if I exclude them I can have more at once.

If I need to script it, python (2.7) is my preferred (only) language.

uhoh
  • 3,713
  • 6
  • 42
  • 95

1 Answers1

1

quota_max is the number of requests your application is allowed per day. 300 is the default for an unregistered application. This used to be mentioned directly on the page describing throttles, but seems to have been removed. Here is historical information describing the default.

To increase this to 10,000, you need to register an application and then authenticate by passing an access token in your script.


To get all titles on a site, you can use a Python library to help:


Assuming you have registered your application and authenticated we can proceed.

First, install StackAPI (documentation):

pip install stackapi

This code will then grab the 10,000 most recent questions (max_pages * page_size) for the site hardwarerecs. Each page costs you one API hit, so the more items per page, the few API calls.

from stackapi import StackAPI

SITE = StackAPI('hardwarerecs')
SITE.page_size = 100
SITE.max_pages = 100

# Filter to only get question title and link
filter = '!BHMIbze0EQ*ved8LyoO6rNjkuLgHPR'

questions = SITE.fetch('questions', filter=filter)

In the questions variable is a dictionary that looks very similar to the API output, except that the library did all the paging for you. Your data is in questions['data'] and, in this case, contains a list of dictionaries that look like this:

[
...
{u'link': u'http://hardwarerecs.stackexchange.com/questions/29/sound-board-to-replace-a-gl2200-in-a-house-of-worship-foh-setting',
 u'title': u'Sound board to replace a GL2200 in a house-of-worship FOH setting?'},
{ u'link': u'http://hardwarerecs.stackexchange.com/questions/31/passive-gps-tracker-logger',
  u'title': u'Passive GPS tracker/logger'}
...
]

This result set is limited to only the title and the link because of the filter we applied. You can find the appropriate filter by adjusting what fields you want in the web UI and copying the filter field.

The hardwarerecs parameter that is passed when creating the SITE parameter is the first part of the site's domain URL. Alternatively, you can find it by looking at the api_site_parameter for your site when looking at the /sites end point.

Community
  • 1
  • 1
Andy
  • 49,085
  • 60
  • 166
  • 233
  • OK thank you! - you've understood exactly what I needed to know, and given me an answer I can use! I've looked at those links and it doesn't jump out at me right away why `filter` is set to a cryptic-looking string or how I would have arrived at that, after I install I can read further using `help()` or `.__doc__` ( `filter = '!BHMIbze0EQ*ved8LyoO6rNjkuLgHPR'` ) – uhoh Jun 15 '16 at 01:09
  • Can you explain how this cryptic-looking string works: `filter = '!BHMIbze0EQ*ved8LyoO6rNjkuLgHPR'`, and why it is so long and cryptic? Thanks! – uhoh Jul 13 '16 at 09:14
  • 1
    Here is a [long technical discussion](https://kevinmontrose.com/2012/01/11/stack-exchange-api-v2-0-implementing-filters/). If you just want the appropriate filter for your API calls though, you can get this by navigating to the API endpoint you are using in the documentation ([/questions](http://api.stackexchange.com/docs/questions) for example) and adjusting the "filter" in the "Try It" section to include the fields you want. Click save, and you'll notice that "default" changes to a cryptic string like above. Copy that string to your application. – Andy Jul 13 '16 at 12:09
  • OK I'll take a closer look, thank you! So this cryptic, unreadable thing is not your idea - it's a stackexchange thing? Is it generated some kind of algorithm that compresses structured information into the tiniest space possible using the subset of ASCII that is legal in the middle of a url? I ask because I have [another question about decoding unreadable ASCII](http://stackoverflow.com/q/37640137/3904031) in urls that hasn't had much attention. – uhoh Jul 13 '16 at 13:16
  • Correct. The filter string is generated by Stack Exchange. – Andy Jul 13 '16 at 13:21