6

I would like to query Wikimedia API to find all images that contain a keyword and filter only the images that are public domain. So no additional CC-SA license.

Curently I'm using the following query to extract the images:

http://en.wikipedia.org/w/api.php?action=query&list=search&format=json&srsearch=roses&srnamespace=6&srinfo=totalhits%7Csuggestion&srprop=size%7Cwordcount%7Ctimestamp%7Cscore%7Csnippet%7Ctitlesnippet%7Credirecttitle%7Credirectsnippet%7Csectiontitle%7Csectionsnippet%7Chasrelated&srredirects=&srlimit=10&generator=images&titles=Wikipedia%3APublic_domain&gimlimit=10

But this is curently returning all the images regardless of their licencing. Maybe I need to modify the namespace but I don't know where to look.

Thanks

  • If you found public domain images useful, consider contributing back to the commons, e.g. by improving the machine-readable metadata available for everyone. https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive – Nemo Nov 07 '15 at 10:26

1 Answers1

4

Um, your current api query does two very distinct things:

  • get the first 10 images from the page Wikipedia:Public Domain - the pages result, you could specify additional properties to get for that result set
  • search the namespace 6 for the word roses

Unfortunately, you can't restrict the search module to search only in some categories, you can only limit it to a single namespace. So you would need to get the categories of all search results and filter them yourself for images in the Category:Public Domain (and all its subcategories). The API query would look like

api.php?action=query&prop=imageinfo|categories&generator=search&gsrsearch=roses&gsrnamespace=6&format=json

Don't forget to continue the query, if you want 10 images that match your category criteria you might need to query (a lot) more than that.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • I think adding `clcategories` might make sense here to limit the categories only to Category:Public Domain and its subcategories. Doing this would most likely mean less `query-continue`. – svick Oct 10 '12 at 13:25
  • Nah, `clcategories` does unfortunately not work because it doesn't incorporate subcategories - and you can't pass all 1200 subcats in there – Bergi Oct 10 '12 at 14:41
  • 1
    *Unfortunately, you can't restrict the search module to search only in some categories* - sure you can, just add `incategory:` to the search query. – Tgr Oct 14 '12 at 02:58
  • 1
    OK, you could do that (although it's not really a [URL] parameter), but this still doesn't include subcategories. You'd have to do something like *`search_string` +* `incategory:"Public Domain" OR incategory:CC-zero OR …`, and that likely goes above the search string length limit. – Bergi Oct 14 '12 at 12:58
  • if I append +incategory:"Public Domain" or anything to the `srsearch` parameter, it returns the following error: `{ servedby: "srv292", error: { code: "srsearch-text-disabled", info: "text search is disabled" } }` – CiprianIonescu Oct 16 '12 at 07:01
  • Hm, [worksforme](http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo|categories&generator=search&gsrsearch=roses%20incategory:%22Public%20Domain%22&gsrnamespace=6&gsrwhat=text&format=jsonfm&servedby). The error message is unambiguous, but I don't think the Wikipedia servers are inproperly configured. – Bergi Oct 16 '12 at 08:45