3

I am aware of the following solutions:

  1. The wikipedia infobox can be returned as HTML or in the "wikitext" model. However, in both cases, I'd need to parse the data after, which is highly error-prone.
  2. Use DBpedia or Wikidata. Unfortunately, these services do not have all the data that I would like to use.

Is there a way to get the infobox information as json or in some other structured format? Alternatively, are there any ruby gems that parse the 'wikitext' model data converting it into a structured format? If not, where can I find the documentation on infobox formatting to do it myself?

svick
  • 236,525
  • 50
  • 385
  • 514
T. M.
  • 58
  • 1
  • 6
  • 1
    Possible duplicate of [How to extract information from a Wikipedia infobox?](http://stackoverflow.com/questions/33862336/how-to-extract-information-from-a-wikipedia-infobox) – Tgr Apr 13 '17 at 09:16

1 Answers1

0

Wikipedia doesn't provide any structured information about infoboxes - the only way is parsing the wikitext yourself, or using services that do that for you, such as DBpedia.

Each template should have a template documentation, you can find that at a wikipage called Template:<name of the template>. For instance, docs of "Infobox officeholder" can be found at https://en.wikipedia.org/wiki/Template:Infobox_officeholder. You can view the name of the infobox/template by viewing the source code, and then looking for the string directly following {{ ({{Infobox officeholder is the begining of infobox officeholder usage).

https://github.com/earwig/mwparserfromhell is an excellent parser for Python, sadly, I'm not aware of any ruby gems for this task.

Martin Urbanec
  • 426
  • 4
  • 11