7

Is there any .Net library to parse pages I've retrieved through the mediawiki api? A standard mediawiki parser that could just give titles and the data in pure data would be fine, but I would rather have one that is specifically suited to wiktionary, one that could give me what type of word it is and all of the definitions.

I would prefer not to write my own parser for this. Any suggestions?

  • 1
    Which output format are you consuming from the API? There are currently 9 from which to choose... – Cᴏʀʏ Dec 05 '11 at 23:40
  • @Alex there are tonnes of examples, start here: http://www.mediawiki.org/wiki/API:Parsing_wikitext – Jeremy Thompson Dec 06 '11 at 04:39
  • I'm not aware of any API or client library that would provide Wiktionary data in a structured format (as opposed to HTML or raw wikitext). Then again, I haven't really looked much, either. – Ilmari Karonen Feb 07 '12 at 20:51
  • I spoke too soon -- just after posting the comment above, I found [this answer](http://stackoverflow.com/a/4778122) which mentions [JWKTL](http://www.ukp.tu-darmstadt.de/software/jwktl/). It's in Java, though, not C#. – Ilmari Karonen Feb 07 '12 at 20:54
  • Possible duplicate of [Has anyone parsed Wiktionary?](http://stackoverflow.com/questions/3364279/has-anyone-parsed-wiktionary) – Nemo Feb 13 '16 at 19:04

2 Answers2

4

The dbnary project provides parsed information from Wiktionary in RDF form.

If you want something processed even further, I provide SQLite and TEI files generated from the dbnary data as part of my WikDict project at download.wikdict.com.

This does not really answer the question for .net libraries, but I'm sure you'll easily find libraries to read XML (TEI), SQLite or RDF.

Karl Bartel
  • 3,244
  • 1
  • 29
  • 28
2

If you get the output in JSON, there are many options you could use, both built in to .NET and external to the framework itself.

If you get the output in XML, again, there are powerful XML manipulation classes within the .NET framework itself and outside of the framework.

You're going to have to be more specific -- provide the format and some example output.

Cᴏʀʏ
  • 105,112
  • 20
  • 162
  • 194
  • 1
    I use this: http://en.wiktionary.org/w/api.php?action=query&prop=revisions&rvprop=content&titles= it comes out with wiki code, the same code that you would type into mediawiki to make the page. –  Dec 05 '11 at 23:49