2

I need to access to very old wikipedia dumps (backups of Wikipedia) in french. I succeed in finding a 2010 backup from archive.org, and now i'm searching for 2006 or even before. I know that in the latest dumps there is all the data from previous ones, but I need to set up in my computer a version of Wikipedia as it was in for example 2006,2010,2012. A thing that is - I guess - impossible to do with the latest dumps.

Thank you very mutch for your help.

Léo Joubert
  • 522
  • 4
  • 17
  • If you're looking for a specific article, check the article's revision history (the "View HIstory" tab at the top right). It lets you view the article at that specific point in time. – Mr. Llama Mar 16 '15 at 16:46

3 Answers3

3

The Wikimedia Foundation provides access to some old dumps on their website. Do note that some of them uses a different schema compared to the present-day Wikipedia, so you might need to modify your tools when working with them.

More archives are also available on Archive.org.

Hydra
  • 131
  • 4
2

There appear to be static HTML dumps from November 2006, available here: http://dumps.wikimedia.org/other/static_html_dumps/

Also, if you get the full dump (with edit history), you could filter it to remove all revisions later than a certain date -- then you should be able to view it as of that date (aside from material later deleted, and so not in the dump).

  • 1
    Ecelent thank you ! can you tell me in a few words how i can remove all revisions from a given date ? – Léo Joubert Mar 16 '15 at 17:09
  • Not exactly -- you'll need to look over the structure of the tables (probably the revisions table) and figure out appropriate SQL. Please do post what you develop, so others can benefit. (Also, an upvote would be nice. ;-) ) – Jesse W at Z - Given up on SE Mar 17 '15 at 18:03
  • actually i want to make a search engine over wikipedia static dump...so is this dump help full for that?? – Sudip Das Sep 10 '16 at 20:22
  • Yes, the dumps are what you'd want to use as the data for your search engine, indeed. You probably won't need the full version history, either, depending on what exactly you are trying to do. – Jesse W at Z - Given up on SE Sep 12 '16 at 16:32
1

Unfortunately, Wikimedia does not keep all historical dumps (with the few exceptions noted by others).

Given your use case, I highly recommend using the JWPL Wikipedia Revision Toolkit: https://dkpro.github.io/dkpro-jwpl/WikipediaRevisionToolkit/

Specifically, you'll likely appreciate the "Time Machine" package, which allows you to reconstruct the state of Wikipedia at some past date. https://dkpro.github.io/dkpro-jwpl/TimeMachine/

Though I haven't used that feature specifically, I've had great success using the Revision Toolkit for other purposes. The JWPL package contains other very useful tools as well.

shiri
  • 745
  • 6
  • 24