4

I know there's DBPedia for Wikipedia, but does something like that exist for Wiktionary? I'd like to get something like https://en.wiktionary.org/wiki/Category:en:Occupations into JSON or similar format.

Jonathan
  • 10,571
  • 13
  • 67
  • 103

2 Answers2

1

If you want to get all the entries pertaining to a category, you could just use MediaWiki API. Try the following query:

https://en.wiktionary.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:en:Occupations&cmprop=title

However, the things get worse if you want to get word data in JSON, XML or any other structured data format. The content of the Wiktionary pages is represented in the format that is convenient to be read by humans, so MediaWiki API doesn't provide any way to get a definition/pronunciation/synonym of a word. Though, there are some APIs, like Wordnik or Lingua Robot, that extract the data from Wiktionary and provide it in JSON.

Roman Kishchenko
  • 587
  • 3
  • 11
1

Another way to go would be to load wiktionary category SQL dump into mysql from wikimedia data dump e.g. enwiktionary-20190901-category.sql.gz.

Then use https://en.wiktionary.org/api/rest_v1/ to retrieve (and parse!) the html for the info you need.

Good luck!

amirouche
  • 7,682
  • 6
  • 40
  • 94
  • 1
    If they're looking for the specific names of the articles/subcategories, `categorylinks` and `page` would be what they're looking for. See: https://stackoverflow.com/q/30387731/6276743, https://stackoverflow.com/q/21782410/6276743 –  Jul 26 '20 at 22:13