-3

It's 100MB, so here's a portion of it: https://drive.google.com/file/d/0B1GVNHhYNzBINWl4TVFOejhtbEE/view?usp=sharing

It doesn't come with an extension, I added the xml extension to it.

What file type is this and how can I parse it? I tried untangle with python and ran into errors.

  • 1
    It is an XML document (presuming it is not invalid) without an XML declaration. Searching for "mediawiki" (and perhaps "xmlns") would yield results without needing to consult here.. (For future reference, HTML has no `ns:name` attributes as well as a set of 'expected' tags.) – user2864740 Jun 10 '15 at 02:16

1 Answers1

0

The file you reference is an XML export of a MediaWiki.

See also the MediaWiki page form XSD.

You can parse it with a standard XML parser, which is available in most languages, including Python.

Community
  • 1
  • 1
kjhughes
  • 106,133
  • 27
  • 181
  • 240