0

I'm trying to figure out how to get all of the page revisions for a specified timeframe. I have created a python script which allows me to get the last 100 revisions, but I don't see anything that allows me to specify a timeframe. I do see the following parameters:

rvstart: Timestamp to start listing from. (enum)
rvend:   Timestamp to end listing at. (enum)

However, I'm not able to get these to work. They work if I put in a timestamp that exists as a revision timestamp, but not as an arbitrary range from which to encapsulate. Does anyone have any thoughts?

Here is my script, if you're interested:

import json

from wikitools import wiki, api

site = wiki.Wiki("http://en.wikipedia.org/w/api.php")
names = ["Sherrod Brown","Maria Cantwell"]
allMembers = []
for name in names:
    params = {'action':'query',
        'titles': name,
        'prop':'revisions',
        'rvprop':'ids|flags|timestamp|userid|user|size|comment|tags',
        'rvlimit':'100'
    }
    req = api.APIRequest(site, params)
    res = req.query(querycontinue=False)
    allMembers.append(res)

with open('/Applications/MAMP/htdocs/python/wikipedia-1.4.0/data/wiki-leg.json', 'w') as outfile:
    json.dump(allMembers, outfile, indent=2)
thefreeline
  • 621
  • 1
  • 12
  • 26
  • http://stackoverflow.com/questions/7136343/wikipedia-api-how-to-get-the-number-of-revisions-of-a-page?rq=1 – Ajay Apr 06 '15 at 18:58
  • @Ajay - I saw this, but this doesn't answer the question about setting a range min/max timeframe for which to search for revisions. – thefreeline Apr 06 '15 at 19:01
  • This question already has an answer here: https://stackoverflow.com/questions/12906739/api-to-get-wikipedia-revision-id-by-date – John Strood Jul 26 '18 at 11:10
  • Possible duplicate of [API to get Wikipedia revision id by date](https://stackoverflow.com/questions/12906739/api-to-get-wikipedia-revision-id-by-date) – John Strood Jul 26 '18 at 11:11

1 Answers1

3

Ok, I think I've got it figured out. The two parameters in highlighted in the OP:

rvstart: Timestamp to start listing from. (enum)
rvend:   Timestamp to end listing at. (enum)

have to be used in conjunction with:

rvdir: Direction to list in. (enum)
    older: List newest revisions first (default) NOTE: rvstart/rvstartid has to be higher than rvend/rvendid
    newer: List oldest revisions first NOTE: rvstart/rvstartid has to be lower than rvend/rvendid

So, updating the params to:

params = {'action':'query',
        'titles': name,
        'prop':'revisions',
        'rvprop':'ids|flags|timestamp|userid|user|size|comment|tags',
        'rvlimit':'100',
        'rvstart':'2009-01-01T12:00:00Z',
        'rvend':'2014-12-31T23:59:00Z',
        'rvdir':'newer'
    }

appears to achieve the intended purpose.

thefreeline
  • 621
  • 1
  • 12
  • 26